Closed
Description
Is your feature request related to a problem? Please describe.
I have a low VRAM GPU and would like to execute the python binding. I can run LLaMA, thanks to https://gist.github.com/rain-1/8cc12b4b334052a21af8029aa9c4fafc . But I didn't understand if this is possible with this binding.
Describe the solution you'd like
I want to run 13B model on my 3060.
Describe alternatives you've considered
https://gist.github.com/rain-1/8cc12b4b334052a21af8029aa9c4fafc
Additional context