Skip to content

llama.cpp: loading model ......terminate called after throwing an instance of 'std::runtime_error' #303

Closed
@mikeyang01

Description

@mikeyang01

langchain 0.0.184

Error happens on:
llama-cpp-python version: 0.1.53~0.1.56

Errror detail:

llama.cpp: loading model from /root/models/ggml-vic7b-q4_0.bin
terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpectedly reached end of file
Aborted (core dumped)

Works correctly on:
llama-cpp-python version: 0.1.52

Correct output:

llama.cpp: loading model from /root/models/ggml-vic7b-q4_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  72.75 KB
llama_model_load_internal: mem required  = 5809.34 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  = 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 

source code

from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

# Make sure the model path is correct for your system!
llm_cpp = LlamaCpp(model_path="/root/models/ggml-vic7b-q4_0.bin", callback_manager=callback_manager, verbose=True, n_ctx=2048)

My investigation:
maybe related to llama.cpp quantize isuue?
ggml-org/llama.cpp#1569
any ideas why this happen?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions