Releases · ggml-org/llama.cpp

16 Dec 08:25

2995341

b7418 Latest

Latest

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama : add support for NVIDIA Nemotron 3 Nano (#18058)

llama : add support for NVIDIA Nemotron Nano 3

This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling
the conversion and running of this model.

Co-authored-by: Georgi Gerganov [email protected]

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-12-16T08:25:40Z
cudart-llama-bin-win-cuda-13.1-x64.zip

sha256:f96935e7e385e3b2d0189239077c10fe8fd7e95690fea4afec455b1b6c7e3f18

384 MB 2025-12-16T08:25:50Z
llama-b7418-bin-310p-openEuler-aarch64.tar.gz

sha256:57b1dc9c50307a42fe74be40809a8ac1fdd8bce08f9c1ff31f2c5d62b441b186

41.4 MB 2025-12-16T08:26:00Z
llama-b7418-bin-310p-openEuler-x86.tar.gz

sha256:a103ee5648e3566266bc2ae12a62ee99313881d838300532a7d507b6f4498092

45.3 MB 2025-12-16T08:26:02Z
llama-b7418-bin-910b-openEuler-aarch64.tar.gz

sha256:d804ff38644db5979cb6f2189c7d740b0c0ba555cd184118953adac6f059e2db

41.4 MB 2025-12-16T08:26:03Z
llama-b7418-bin-910b-openEuler-x86.tar.gz

sha256:e41a899ac7038346a1f8a19f022388168e3b133694f9a3a26fbb0b631c8c82b8

45.3 MB 2025-12-16T08:26:05Z
llama-b7418-bin-macos-arm64.tar.gz

sha256:f54cb7d758c7aaf5a5ebcc808a38cbc25faaae8b83443320dd897dcd9240d9c7

15.7 MB 2025-12-16T08:26:07Z
llama-b7418-bin-macos-arm64.zip

sha256:6b9e4f21822462bfea069ba3a07bf167bcef4dfca1d474c62a93af8080c51817

15.7 MB 2025-12-16T08:26:08Z
llama-b7418-bin-macos-x64.tar.gz

sha256:6caa46f8f2349b7fb571742be2085dda7d4c2c6453ff93449027143a24c9a9f4

40.5 MB 2025-12-16T08:26:09Z
llama-b7418-bin-macos-x64.zip

sha256:f12c8b3af92ef146d645d1ede835fdc1b21f43f76d274ad834b9d7e55425d6cb

40.4 MB 2025-12-16T08:26:11Z
Source code (zip)

2025-12-16T06:19:26Z
Source code (tar.gz)

2025-12-16T06:19:26Z

16 Dec 00:45

github-actions

b7415

c45f89d

b7415

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ggml-hexagon: mm for mtmd (#17894)

feat: add run_mtmd script for hexagon
fix: fix issue in fp16xfp32 mm
fix: remove opt_experiment for fp16xfp32 mm
fix: ggml-hexagon: matmul fp16xfp32 support non-contigious src0
fix: fix syntax check for run-mtmd.sh for cli

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

16 Dec 01:09

github-actions

b7414

9d52f17

b7414

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

model : add KORMo model (#18032)

vocab: add KORMo Tokenizer
model: add KORMoForCausalLM
vocab: change pretokenizer to qwen2
lint: fix unintended line removal
model: make qwen2 bias tensor optional
model: use qwen2 architecture for KORMo

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

16 Dec 00:18

github-actions

b7413

4529c66

b7413

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

kv-cache: Fix state restore fragmented cache (#17982)

kv-cache : fix state restore with fragmented cache (#17527)

Change find_slot to allow non-contiguous allocation during state restore. Fixes 'failed to find available cells in kv cache' error when restoring state to fragmented cache.

tests : update logic
cleanup: tightened state_read_meta sig, added is_contiguous case
fix: state_read_meta arg reorder loose ends

Co-authored-by: Georgi Gerganov [email protected]

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

15 Dec 19:47

github-actions

b7411

165caaf

b7411

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

metal: use shared buffers on eGPU (#17866)

metal: use shared buffers on eGPU

With #15906, I noticed on important regression when using metal backend on eGPU.
This commit restore the previous behavior and add an option to force its activation.

metal: use shared buffers on eGPU
metal: use shared buffers on eGPU

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

15 Dec 18:01

github-actions

b7410

96a181a

b7410

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

mtmd: refactor audio preprocessing (#17978)

mtmd: refactor audio preprocessing
refactor

Co-authored-by: Tarek [email protected]

wip
wip (2)
improve constructor
fix use_natural_log
fix padding for short input
clean up
remove need_chunking

Co-authored-by: Tarek [email protected]

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

15 Dec 04:15

github-actions

b7406

4aced7a

b7406

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

[SYCL] Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (#17826)

support gpt-oss GPU by OP add-id, mul_mat for mxfp4, swiglu_oai, fix warning
fix fault ut case, update ops.md
rebase, fix format issue

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

15 Dec 04:09

github-actions

b7405

745fa0e

b7405

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

model : add glm-asr support (#17901)

[model] add glm-asr support
fix format for ci
fix convert format for ci
update glm_asr convert script & use build_ffn for glm_asr clip & use build_stack for padding and review
check root architecture for convert hf script
fix conficlt with upstream
fix convert script for glm asr & format clip-impl
format
restore hparams text
improved conversion

Co-authored-by: Sigbjørn Skjæret [email protected]

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

14 Dec 22:34

github-actions

b7404

5239229

b7404

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

preset: handle negated arg, reverse the meaning if needed (#18041)

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

14 Dec 19:48

github-actions

b7402

37f5a10

b7402

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

mtmd: enhance image resizing in llava_uhd (#18014)

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

Releases: ggml-org/llama.cpp

b7418

Uh oh!

b7415

Uh oh!

b7414

Uh oh!

b7413

Uh oh!

b7411

Uh oh!

b7410

Uh oh!

b7406

Uh oh!

b7405

Uh oh!

b7404

Uh oh!

b7402

Uh oh!