llama.cpp: loading model ......terminate called after throwing an instance of 'std::runtime_error'

langchain                0.0.184                                                                                                       

Error happens on:
llama-cpp-python version: 0.1.53~0.1.56

Errror detail:
```
llama.cpp: loading model from /root/models/ggml-vic7b-q4_0.bin
terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpectedly reached end of file
Aborted (core dumped)
```
Works correctly on:
llama-cpp-python version: 0.1.52

Correct output:
```
llama.cpp: loading model from /root/models/ggml-vic7b-q4_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  72.75 KB
llama_model_load_internal: mem required  = 5809.34 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  = 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
```


source code
```
from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

# Make sure the model path is correct for your system!
llm_cpp = LlamaCpp(model_path="/root/models/ggml-vic7b-q4_0.bin", callback_manager=callback_manager, verbose=True, n_ctx=2048)
```
My investigation:
maybe related to llama.cpp quantize isuue?
https://github.com/ggerganov/llama.cpp/issues/1569
any ideas why this happen?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama.cpp: loading model ......terminate called after throwing an instance of 'std::runtime_error' #303

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

llama.cpp: loading model ......terminate called after throwing an instance of 'std::runtime_error' #303

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions