How to run with `-ngl` parameter?

**Is your feature request related to a problem? Please describe.**
I have a low VRAM GPU and would like to execute the python binding. I can run LLaMA, thanks to https://gist.github.com/rain-1/8cc12b4b334052a21af8029aa9c4fafc . But I didn't understand if this is possible with this binding.

**Describe the solution you'd like**
I want to run 13B model on my 3060.

**Describe alternatives you've considered**
https://gist.github.com/rain-1/8cc12b4b334052a21af8029aa9c4fafc

**Additional context**



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to run with `-ngl` parameter? #268

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to run with -ngl parameter? #268

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

How to run with `-ngl` parameter? #268