Releases · ggml-org/llama.cpp

25 Feb 16:06

a82c9e7

b4778 Latest

Latest

vulkan: fix assertion when qy_needs_dequant (#12068)

Looks like a copy/paste bug from qx_needs_dequant.

Assets 25

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-02-25T16:06:02Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-02-25T16:06:11Z
llama-b4778-bin-macos-arm64.zip

23.3 MB 2025-02-25T16:06:20Z
llama-b4778-bin-macos-x64.zip

24.9 MB 2025-02-25T16:06:22Z
llama-b4778-bin-ubuntu-arm64.zip

25.4 MB 2025-02-25T16:06:23Z
llama-b4778-bin-ubuntu-vulkan-x64.zip

30.7 MB 2025-02-25T16:06:24Z
llama-b4778-bin-ubuntu-x64.zip

26.9 MB 2025-02-25T16:06:25Z
llama-b4778-bin-win-avx-x64.zip

16.4 MB 2025-02-25T16:06:26Z
llama-b4778-bin-win-avx2-x64.zip

16.4 MB 2025-02-25T16:06:27Z
llama-b4778-bin-win-avx512-x64.zip

16.4 MB 2025-02-25T16:06:28Z
Source code (zip)

2025-02-25T15:30:21Z
Source code (tar.gz)

2025-02-25T15:30:21Z

25 Feb 13:21

github-actions

b4777

401af80

b4777

server: handle echo=false on /v1/completions (#12060)

Assets 25

25 Feb 13:16

github-actions

b4776

c132239

b4776

add OP sigmoid (#12056)

Co-authored-by: Judd <[email protected]>

Assets 25

25 Feb 13:15

github-actions

b4775

393fca6

b4775

ggml-cpu: Fix build with sve (#12059)

* ggml-cpu: Fix build with sve

Signed-off-by: Molly Sophia <[email protected]>

* ggml-cpu: Remove unused variable in sve q3_k vec dot

Signed-off-by: Molly Sophia <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>

Assets 25

25 Feb 12:56

github-actions

b4774

61d4f39

b4774

vulkan: implement more backpropagation operators (#11914)

* vulkan: implement GGML_OP_ROPE_BACK

* vulkan: implement GGML_OP_RMS_NORM_BACK

* vulkan: implement GGML_OP_SILU_BACK

* vulkan: implement GGML_OP_SOFTMAX_BACK

Assets 25

25 Feb 11:29

github-actions

b4773

0b52745

b4773

server: support add_generation_prompt query param (#12062)

Assets 25

25 Feb 10:21

github-actions

b4771

3e9a286

b4771

llama : expose llama_model_n_head_kv in the API (#11997)

It's useful to be able to have this from the library layer as it's a key
parameter of the model (e.g. to figure out how much KV cache memory is
needed).

Assets 25

25 Feb 10:15

github-actions

b4770

58d07a8

b4770

metal : copy kernels for quant to F32/F16 conversions (#12017)

metal: use dequantize_q templates

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 25

24 Feb 22:23

github-actions

b4769

34a846b

b4769

opencl: fix for small models (#11950)

* opencl: fix small shape gemv, remove unused extensions

* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size

* opencl: fix for token length < 4

* opencl: use wave size of 64 for all Adreno GPUs

---------

Co-authored-by: Shawn Gu <[email protected]>
Co-authored-by: Skyler Szot <[email protected]>

Assets 25

24 Feb 16:54

github-actions

b4768

7a2c913

b4768

llava : Add Granite Vision Support (#11794)

* Add super wip scripts for multimodal granite gguf

Signed-off-by: Alex-Brooks <[email protected]>

* Add example for converting mmgranite to gguf

Signed-off-by: Alex-Brooks <[email protected]>

* remove hardcoded path

Signed-off-by: Alex-Brooks <[email protected]>

* Add vision feature layer to gguf params

Signed-off-by: Alex-Brooks <[email protected]>

* Clean up llava surgery and remove name substitution hacks

Signed-off-by: Alex-Brooks <[email protected]>

* Add transformers llava next tensor name mapping

Signed-off-by: Alex-Brooks <[email protected]>

* Make siglip / openclip mutuall exclusive

Signed-off-by: Alex-Brooks <[email protected]>

* Fix projector linear substitution

Signed-off-by: Alex-Brooks <[email protected]>

* Fix linear 2 substitution index

Signed-off-by: Alex-Brooks <[email protected]>

* Increase max flattened gridpoints to 64

Signed-off-by: Alex-Brooks <[email protected]>

* Fix hardcoded concat for multiple feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Pull vision feature layers out of gguf keys

Signed-off-by: Alex-Brooks <[email protected]>

* fix num gridpoints and use all layers

Signed-off-by: Alex-Brooks <[email protected]>

* Avoid dropping last image encoder layer in llava models

Signed-off-by: Alex-Brooks <[email protected]>

* Use 10 for max number of patches

Signed-off-by: Alex-Brooks <[email protected]>

* Standardize vision feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Cleanup logs

Signed-off-by: Alex-Brooks <[email protected]>

* Update comment for vision feature layer init

Signed-off-by: Alex-Brooks <[email protected]>

* Update notes for alternative to legacy llm conversion script

Signed-off-by: Alex-Brooks <[email protected]>

* Fix notes rendering

Signed-off-by: Alex-Brooks <[email protected]>

* Add v prefix to vision feature layer log

Signed-off-by: Alex-Brooks <[email protected]>

* Use current defaults for feature layer

Signed-off-by: Alex-Brooks <[email protected]>

* Use constant for max gridpoints / feat layers, style fixes

Signed-off-by: Alex-Brooks <[email protected]>

* clarify non-negative feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Remove CLIP_API from func signature

Signed-off-by: Alex-Brooks <[email protected]>

* USE MAX_IMAGE_FEATURE_LAYERS const in layer calc

Signed-off-by: Alex-Brooks <[email protected]>

* Clarify feature layers are non negative ints and not uint

Signed-off-by: Alex-Brooks <[email protected]>

* Fix condition for reading feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* pop last llava layer when feature layers are unset

Signed-off-by: Alex-Brooks <[email protected]>

* Fix unset vision layer 0

Signed-off-by: Alex-Brooks <[email protected]>

* Update examples/llava/clip.cpp

Co-authored-by: Xuan-Son Nguyen <[email protected]>

* Reenable assertion for out of bounds get_rows

Signed-off-by: Alex-Brooks <[email protected]>

* Use std vector for gridpoints and feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Caculate max feature layer at load time

Signed-off-by: Alex-Brooks <[email protected]>

* Include base patch for granite vision allocation

Signed-off-by: Alex-Brooks <[email protected]>

* Fix trailing whitespace

Signed-off-by: Alex-Brooks <[email protected]>

* Add max num patches = 10 back for minicpmv

Signed-off-by: Alex-Brooks <[email protected]>

* Use unordered set to store feature layers

Co-authored-by: Xuan-Son Nguyen <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>

* Use max feature layer for postnorm

Signed-off-by: Alex-Brooks <[email protected]>

* Apply suggestions from code review

---------

Signed-off-by: Alex-Brooks <[email protected]>
Co-authored-by: Xuan-Son Nguyen <[email protected]>

Assets 25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b4778

b4777

b4776

b4775

b4774

b4773

b4771

b4770

b4769

b4768