Description
Expected Behavior
I was trying to use CLBlast in koboldcpp after verifying it works with llamacpp.
Current Behavior
It throws ggml_opencl: clGetPlatformIDs(NPLAT, platform_ids, &n_platforms) error -1001 at ggml-opencl.cpp:968
no matter which plataform or device I choose.
In llamacpp I was able to select my gpu using GGML_OPENCL_PLATFORM=Clover GGML_OPENCL_DEVICE=1 ./main ...
Environment and Context
I'm using a Ubuntu like system (Pop!OS) on a laptop with an AMD APU an a AMD dGPU which isn't too great but it's still better than the CPU.
Failure Information (for bugs)
I messed a bit with the code to find out what was happening and it seems koboldcpp can't find any OpenCL devices.
clinfo -l
output:
Platform #0: Clover
+-- Device #0: ICELAND (iceland, LLVM 15.0.7, DRM 3.52, 6.4.6-76060406-generic)
`-- Device #1: AMD Radeon Vega 8 Graphics (raven, LLVM 15.0.7, DRM 3.52, 6.4.6-76060406-generic)
Steps to Reproduce
- Own a laptop with an AMD APU
- Try to use koboldcpp with CLBlast
Failure Logs
(base) pedrohenrique@pop-os:~/Gitclone/koboldcpp$ python koboldcpp.py --model models/airoboros-7b-gpt4-1.4.ggmlv3.q4_0.bin --useclblast 0 0
***
Welcome to KoboldCpp - Version 1.42.1
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.so
==========
Namespace(model='models/airoboros-7b-gpt4-1.4.ggmlv3.q4_0.bin', model_param='models/airoboros-7b-gpt4-1.4.ggmlv3.q4_0.bin', port=5001, port_param=5001, host='', launch=False, lora=None, config=None, threads=3, blasthreads=3, psutil_set_threads=False, highpriority=False, contextsize=2048, blasbatchsize=512, ropeconfig=[0.0, 10000.0], stream=False, smartcontext=False, unbantokens=False, bantokens=None, usemirostat=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, noblas=False, useclblast=[0, 0], usecublas=None, gpulayers=0, tensor_split=None)
==========
Loading model: /home/pedrohenrique/Gitclone/koboldcpp/models/airoboros-7b-gpt4-1.4.ggmlv3.q4_0.bin
[Threads: 3, BlasThreads: 3, SmartContext: False]
---
Identified as LLAMA model: (ver 5)
Attempting to Load...
---
Using automatic RoPE scaling (scale:1.000, base:10000.0)
System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
llama.cpp: loading model from /home/pedrohenrique/Gitclone/koboldcpp/models/airoboros-7b-gpt4-1.4.ggmlv3.q4_0.bin
llama_v3_model_load_internal: format = ggjt v3 (latest)
llama_v3_model_load_internal: n_vocab = 32000
llama_v3_model_load_internal: n_ctx = 2048
llama_v3_model_load_internal: n_embd = 4096
llama_v3_model_load_internal: n_mult = 256
llama_v3_model_load_internal: n_head = 32
llama_v3_model_load_internal: n_head_kv = 32
llama_v3_model_load_internal: n_layer = 32
llama_v3_model_load_internal: n_rot = 128
llama_v3_model_load_internal: n_gqa = 1
llama_v3_model_load_internal: rnorm_eps = 5.0e-06
llama_v3_model_load_internal: n_ff = 11008
llama_v3_model_load_internal: freq_base = 10000.0
llama_v3_model_load_internal: freq_scale = 1
llama_v3_model_load_internal: ftype = 2 (mostly Q4_0)
llama_v3_model_load_internal: model size = 7B
llama_v3_model_load_internal: ggml ctx size = 0.09 MB
ggml_opencl: clGetPlatformIDs(NPLAT, platform_ids, &n_platforms) error -1001 at ggml-opencl.cpp:968
You may be out of VRAM. Please check if you have enough.
If it didn't work with llamacpp, I wouldn't even have bothered to open an issue, as it could've been a problem with my laptop,
but seeing that it works well with llama, I just don't understand why it doens't with koboldcpp.