Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ggml-cpu: Support s390x SIMD Instruction Set (#12019)
* ggml: add s390x ARCH_FLAGS for compilation Signed-off-by: Aaron Teo <[email protected]> * ggml: add SIMD for s390x using vector intrinsics SIMD is activated for: * ggml_vec_dot_f32 * ggml_vec_dot_f16 * ggml_vec_mad_f32 * ggml_vec_mad_f16 * ggml_vec_mad_f32_unroll * ggml_vec_scale_f32 * ggml_vec_scale_f16 SIMD is NOT activated for: * ggml_vec_dot_f16_unroll (pending bugfix) Signed-off-by: Aaron Teo <[email protected]> * ggml: fix missing escape character in GGML_F32x4_REDUCE Signed-off-by: Aaron Teo <[email protected]> * ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR Signed-off-by: Aaron Teo <[email protected]> * ggml: fix s390x GGML_F32x4_REDUCE Signed-off-by: Aaron Teo <[email protected]> * ggml: full SIMD activation for F32,F16 s390x Signed-off-by: Aaron Teo <[email protected]> * ggml: add option to disable s390x VXE/VXE2 Signed-off-by: Aaron Teo <[email protected]> * ggml: change vecintrin.h include to ggml-cpu-impl * add __VXE__ and __VXE2__ macros Signed-off-by: Aaron Teo <[email protected]> * cmake: add s390x target detection for VX/VXE/VXE2 Signed-off-by: Aaron Teo <[email protected]> * ggml: move s390x vector intrinsics to ggml-cpu-impl.h Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x Q8_0 SIMD Signed-off-by: Aaron Teo <[email protected]> * ggml: correct documentation for Q8_0 Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x reduce code complexity Q8_0 Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x bugfix typo Q8_0 Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x SIMD activated for Q4_1 Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x inline vec_reve Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x SIMD activation for Q4_0 Signed-off-by: Aaron Teo <[email protected]> * ggml: add VXE backend feature Signed-off-by: Aaron Teo <[email protected]> * ggml: remove test.py Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x SIMD activation for quantize_row_q8_0 Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x SIMD activation for quantize_row_q8_1 Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x SIMD activation for iq4_xs Signed-off-by: Aaron Teo <[email protected]> * ggml: bugfix iq4_xs Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x SIMD activation for iq4_nl Signed-off-by: Aaron Teo <[email protected]> * ggml: add float, double, and long vector data type Signed-off-by: Aaron Teo <[email protected]> * ggml: clean up iq4_xs SIMD Signed-off-by: Aaron Teo <[email protected]> * ggml: fix improper use of restrict keyword Signed-off-by: Aaron Teo <[email protected]> * ggml: update warning message for ggml_vec_tbl Signed-off-by: Aaron Teo <[email protected]> * ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K Signed-off-by: Aaron Teo <[email protected]> * ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs Signed-off-by: Aaron Teo <[email protected]> * ggml: switch to restrict for iq4_nl Signed-off-by: Aaron Teo <[email protected]> * ggml: slight dot product speed improvement for q4_1_q8_1 Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x SIMD activation for q6_K Signed-off-by: Aaron Teo <[email protected]> * ggml: add missing `_t` to ggml_int8x16x4_t Signed-off-by: Aaron Teo <[email protected]> * ggml: fix missing `_t` for ggml_vec_xl_s8x4 Signed-off-by: Aaron Teo <[email protected]> * ggml: fix more missing `_t` Signed-off-by: Aaron Teo <[email protected]> * ggml: add unroll and prefetch to Q8_0 increase of 3.86% for prompt processing and 32.22% for token generation Signed-off-by: Aaron Teo <[email protected]> * ggml: patch Q8_0 to use proper vector sizes Signed-off-by: Aaron Teo <[email protected]> * ggml: optimise Q8_0 dot prod compute kernel further Signed-off-by: Aaron Teo <[email protected]> * ggml: add unroll and prefetch to Q4_1 Signed-off-by: Aaron Teo <[email protected]> * ggml: refactor Q6_K variable naming for readability Signed-off-by: Aaron Teo <[email protected]> * ggml: fix Q6_K typos Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x SIMD activation for Q5_K Signed-off-by: Aaron Teo <[email protected]> * ggml: fix wrong char*x16_t naming Signed-off-by: Aaron Teo <[email protected]> * ggml: Q5_K y0 wrong signness Signed-off-by: Aaron Teo <[email protected]> * ggml: fix Q5_K invalid uchar type Signed-off-by: Aaron Teo <[email protected]> * ggml: fix Q5_K invalid uchar type Signed-off-by: Aaron Teo <[email protected]> * ggml: s390x SIMD activation for Q4_K Signed-off-by: Aaron Teo <[email protected]> * ggml: fix Q4_K invalid vector intrinsics Signed-off-by: Aaron Teo <[email protected]> * ggml: simplify ggml_padd_s16 compute kernel Signed-off-by: Aaron Teo <[email protected]> * ggml: correct ggml-cpu vxe wording Signed-off-by: Aaron Teo <[email protected]> * ggml: change ggml_aligned_malloc alignment to 256 256 is the cache line size for s390x platforms Signed-off-by: Aaron Teo <[email protected]> * ggml: resolve pr merge via cherry-pick 225bbbf Signed-off-by: Aaron Teo <[email protected]> * ggml : fix LoongArch compile error with 128-bit SIMD (#11701) * ggml: resolve pr merge via cherry-pick 4571953 Signed-off-by: Aaron Teo <[email protected]> * ggml: cmake remove fork when determining s390x machine type thank you @ericcurtin Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]> Co-authored-by: Jinyang He <[email protected]> Co-authored-by: junchao-zhao <[email protected]>
- Loading branch information