madlad400 settings to improve performance #8510

MathiasSchindler · 2024-07-16T12:40:32Z

MathiasSchindler
Jul 16, 2024

I had used the madlad400 model before using the textsynth server by Fabrice Bellard (https://bellard.org/ts_server/) and I am very happy to see that llama.cpp is now able to run this model as well.

I notice there are some differences in the performance. The first difference is that the model at llama results in differing translations which is fully explainable by the random seed. Setting a fixed seed results in reproducible translations.

The other thing is that there is a significant difference in the speed in which the model runs (both on the same machine running CUDA). Are ther other settings on llama.cpp that could increase the speed of the translation more closely to what seems to be possible?

misutoneko · 2024-08-04T12:16:01Z

misutoneko
Aug 4, 2024

Hi, thank you for posting. I've used MADLAD400 in both candle and llama.cpp incarnations, but not with ts_server. Now since it's Fabrice we're talking about here, I assume you mean to say ts_server is much faster on the same hardware?
I don't know any switches that could help, but I have to ask: are you sure the quantization level is the same?

My main problem with MADLAD400 performance is that I must do translations one sentence at a time and I assume that's quite a bit slower because llama-cli is invoked separately for each sentence. But it's still fast enough for my use case.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

madlad400 settings to improve performance #8510

{{title}}

Replies: 1 comment

{{title}}

Select a reply

madlad400 settings to improve performance #8510

MathiasSchindler Jul 16, 2024

Replies: 1 comment

misutoneko Aug 4, 2024

MathiasSchindler
Jul 16, 2024

misutoneko
Aug 4, 2024