Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4778
b4777
server: handle echo=false on /v1/completions (#12060)
b4776
add OP sigmoid (#12056) Co-authored-by: Judd <[email protected]>
b4775
ggml-cpu: Fix build with sve (#12059) * ggml-cpu: Fix build with sve Signed-off-by: Molly Sophia <[email protected]> * ggml-cpu: Remove unused variable in sve q3_k vec dot Signed-off-by: Molly Sophia <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]>
b4774
vulkan: implement more backpropagation operators (#11914) * vulkan: implement GGML_OP_ROPE_BACK * vulkan: implement GGML_OP_RMS_NORM_BACK * vulkan: implement GGML_OP_SILU_BACK * vulkan: implement GGML_OP_SOFTMAX_BACK
b4773
server: support add_generation_prompt query param (#12062)
b4771
llama : expose llama_model_n_head_kv in the API (#11997) It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).
b4770
metal : copy kernels for quant to F32/F16 conversions (#12017) metal: use dequantize_q templates --------- Co-authored-by: Georgi Gerganov <[email protected]>
b4769
opencl: fix for small models (#11950) * opencl: fix small shape gemv, remove unused extensions * opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size * opencl: fix for token length < 4 * opencl: use wave size of 64 for all Adreno GPUs --------- Co-authored-by: Shawn Gu <[email protected]> Co-authored-by: Skyler Szot <[email protected]>
b4768
llava : Add Granite Vision Support (#11794) * Add super wip scripts for multimodal granite gguf Signed-off-by: Alex-Brooks <[email protected]> * Add example for converting mmgranite to gguf Signed-off-by: Alex-Brooks <[email protected]> * remove hardcoded path Signed-off-by: Alex-Brooks <[email protected]> * Add vision feature layer to gguf params Signed-off-by: Alex-Brooks <[email protected]> * Clean up llava surgery and remove name substitution hacks Signed-off-by: Alex-Brooks <[email protected]> * Add transformers llava next tensor name mapping Signed-off-by: Alex-Brooks <[email protected]> * Make siglip / openclip mutuall exclusive Signed-off-by: Alex-Brooks <[email protected]> * Fix projector linear substitution Signed-off-by: Alex-Brooks <[email protected]> * Fix linear 2 substitution index Signed-off-by: Alex-Brooks <[email protected]> * Increase max flattened gridpoints to 64 Signed-off-by: Alex-Brooks <[email protected]> * Fix hardcoded concat for multiple feature layers Signed-off-by: Alex-Brooks <[email protected]> * Pull vision feature layers out of gguf keys Signed-off-by: Alex-Brooks <[email protected]> * fix num gridpoints and use all layers Signed-off-by: Alex-Brooks <[email protected]> * Avoid dropping last image encoder layer in llava models Signed-off-by: Alex-Brooks <[email protected]> * Use 10 for max number of patches Signed-off-by: Alex-Brooks <[email protected]> * Standardize vision feature layers Signed-off-by: Alex-Brooks <[email protected]> * Cleanup logs Signed-off-by: Alex-Brooks <[email protected]> * Update comment for vision feature layer init Signed-off-by: Alex-Brooks <[email protected]> * Update notes for alternative to legacy llm conversion script Signed-off-by: Alex-Brooks <[email protected]> * Fix notes rendering Signed-off-by: Alex-Brooks <[email protected]> * Add v prefix to vision feature layer log Signed-off-by: Alex-Brooks <[email protected]> * Use current defaults for feature layer Signed-off-by: Alex-Brooks <[email protected]> * Use constant for max gridpoints / feat layers, style fixes Signed-off-by: Alex-Brooks <[email protected]> * clarify non-negative feature layers Signed-off-by: Alex-Brooks <[email protected]> * Remove CLIP_API from func signature Signed-off-by: Alex-Brooks <[email protected]> * USE MAX_IMAGE_FEATURE_LAYERS const in layer calc Signed-off-by: Alex-Brooks <[email protected]> * Clarify feature layers are non negative ints and not uint Signed-off-by: Alex-Brooks <[email protected]> * Fix condition for reading feature layers Signed-off-by: Alex-Brooks <[email protected]> * pop last llava layer when feature layers are unset Signed-off-by: Alex-Brooks <[email protected]> * Fix unset vision layer 0 Signed-off-by: Alex-Brooks <[email protected]> * Update examples/llava/clip.cpp Co-authored-by: Xuan-Son Nguyen <[email protected]> * Reenable assertion for out of bounds get_rows Signed-off-by: Alex-Brooks <[email protected]> * Use std vector for gridpoints and feature layers Signed-off-by: Alex-Brooks <[email protected]> * Caculate max feature layer at load time Signed-off-by: Alex-Brooks <[email protected]> * Include base patch for granite vision allocation Signed-off-by: Alex-Brooks <[email protected]> * Fix trailing whitespace Signed-off-by: Alex-Brooks <[email protected]> * Add max num patches = 10 back for minicpmv Signed-off-by: Alex-Brooks <[email protected]> * Use unordered set to store feature layers Co-authored-by: Xuan-Son Nguyen <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> * Use max feature layer for postnorm Signed-off-by: Alex-Brooks <[email protected]> * Apply suggestions from code review --------- Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>