CUDA: compress-mode size #12029

Green-Sky · 2025-02-22T18:01:05Z

cuda 12.8 added the option to specify stronger compression for binaries.

I ran some tests in CI with the new ubuntu 12.8 docker image:

`89-real` arch

In this scenario, it appears it is not compressing by default at all?

mode	ggml-cuda.so
none	64M
speed (default)	64M
balanced	64M
size	18M

`60;61;70;75;80` arches

mode	ggml-cuda.so
none	994M
speed (default)	448M
balanced	368M
size	127M

I did not test the runtime load overhead this should incur. But for most ggml-cuda usecases, the processes are usually long(er) lived, so the trade-off seems reasonable to me.

cuda 12.8 added the option to specify stronger compression for binaries.

CUDA: compress mode size

d7580f2

cuda 12.8 added the option to specify stronger compression for binaries.

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Feb 22, 2025

Green-Sky marked this pull request as ready for review February 24, 2025 12:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: compress-mode size #12029

CUDA: compress-mode size #12029

Green-Sky commented Feb 22, 2025 •

edited

Loading

CUDA: compress-mode size #12029

Are you sure you want to change the base?

CUDA: compress-mode size #12029

Conversation

Green-Sky commented Feb 22, 2025 • edited Loading

89-real arch

60;61;70;75;80 arches

Green-Sky commented Feb 22, 2025 •

edited

Loading

`89-real` arch

`60;61;70;75;80` arches