Closed
Description
langchain 0.0.184
Error happens on:
llama-cpp-python version: 0.1.53~0.1.56
Errror detail:
llama.cpp: loading model from /root/models/ggml-vic7b-q4_0.bin
terminate called after throwing an instance of 'std::runtime_error'
what(): unexpectedly reached end of file
Aborted (core dumped)
Works correctly on:
llama-cpp-python version: 0.1.52
Correct output:
llama.cpp: loading model from /root/models/ggml-vic7b-q4_0.bin
llama_model_load_internal: format = ggjt v2 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 72.75 KB
llama_model_load_internal: mem required = 5809.34 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
source code
from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager
# Make sure the model path is correct for your system!
llm_cpp = LlamaCpp(model_path="/root/models/ggml-vic7b-q4_0.bin", callback_manager=callback_manager, verbose=True, n_ctx=2048)
My investigation:
maybe related to llama.cpp quantize isuue?
ggml-org/llama.cpp#1569
any ideas why this happen?