madlad400 settings to improve performance #8510
MathiasSchindler
started this conversation in
General
Replies: 1 comment
-
Hi, thank you for posting. I've used MADLAD400 in both candle and llama.cpp incarnations, but not with ts_server. Now since it's Fabrice we're talking about here, I assume you mean to say ts_server is much faster on the same hardware? My main problem with MADLAD400 performance is that I must do translations one sentence at a time and I assume that's quite a bit slower because llama-cli is invoked separately for each sentence. But it's still fast enough for my use case. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I had used the madlad400 model before using the textsynth server by Fabrice Bellard (https://bellard.org/ts_server/) and I am very happy to see that llama.cpp is now able to run this model as well.
I notice there are some differences in the performance. The first difference is that the model at llama results in differing translations which is fully explainable by the random seed. Setting a fixed seed results in reproducible translations.
The other thing is that there is a significant difference in the speed in which the model runs (both on the same machine running CUDA). Are ther other settings on llama.cpp that could increase the speed of the translation more closely to what seems to be possible?
Beta Was this translation helpful? Give feedback.
All reactions