Releases: ggml-org/llama.cpp
b7418
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
llama : add support for NVIDIA Nemotron 3 Nano (#18058)
- llama : add support for NVIDIA Nemotron Nano 3
This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling
the conversion and running of this model.
Co-authored-by: Georgi Gerganov [email protected]
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7415
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
ggml-hexagon: mm for mtmd (#17894)
-
feat: add run_mtmd script for hexagon
-
fix: fix issue in fp16xfp32 mm
-
fix: remove opt_experiment for fp16xfp32 mm
-
fix: ggml-hexagon: matmul fp16xfp32 support non-contigious src0
-
fix: fix syntax check for run-mtmd.sh for cli
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7414
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
model : add KORMo model (#18032)
-
vocab: add KORMo Tokenizer
-
model: add KORMoForCausalLM
-
vocab: change pretokenizer to qwen2
-
lint: fix unintended line removal
-
model: make qwen2 bias tensor optional
-
model: use qwen2 architecture for KORMo
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7413
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
kv-cache: Fix state restore fragmented cache (#17982)
- kv-cache : fix state restore with fragmented cache (#17527)
Change find_slot to allow non-contiguous allocation during state restore. Fixes 'failed to find available cells in kv cache' error when restoring state to fragmented cache.
-
tests : update logic
-
cleanup: tightened state_read_meta sig, added is_contiguous case
-
fix: state_read_meta arg reorder loose ends
Co-authored-by: Georgi Gerganov [email protected]
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7411
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
metal: use shared buffers on eGPU (#17866)
- metal: use shared buffers on eGPU
With #15906, I noticed on important regression when using metal backend on eGPU.
This commit restore the previous behavior and add an option to force its activation.
-
metal: use shared buffers on eGPU
-
metal: use shared buffers on eGPU
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7410
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
mtmd: refactor audio preprocessing (#17978)
-
mtmd: refactor audio preprocessing
-
refactor
Co-authored-by: Tarek [email protected]
-
wip
-
wip (2)
-
improve constructor
-
fix use_natural_log
-
fix padding for short input
-
clean up
-
remove need_chunking
Co-authored-by: Tarek [email protected]
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7406
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
[SYCL] Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (#17826)
-
support gpt-oss GPU by OP add-id, mul_mat for mxfp4, swiglu_oai, fix warning
-
fix fault ut case, update ops.md
-
rebase, fix format issue
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7405
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
model : add glm-asr support (#17901)
-
[model] add glm-asr support
-
fix format for ci
-
fix convert format for ci
-
update glm_asr convert script & use build_ffn for glm_asr clip & use build_stack for padding and review
-
check root architecture for convert hf script
-
fix conficlt with upstream
-
fix convert script for glm asr & format clip-impl
-
format
-
restore hparams text
-
improved conversion
Co-authored-by: Sigbjørn Skjæret [email protected]
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7404
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
preset: handle negated arg, reverse the meaning if needed (#18041)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7402
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
mtmd: enhance image resizing in llava_uhd (#18014)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: