Replies: 1 comment
-
You can use the asynchronous version of the OpenAI client, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've tried both offline batch inference and server inference. I found that with the same dataset and the same model, the speed of server inference is more than twice as slow as that of offline batch inference.
I guess the main reason is that offline batch inference uses batched inference, because the default value of max_num_seqs is 256. (Please correct me if my understanding is wrong.)
If I change the offline inference to input one by one, it will become very slow.
However, I don't know how to transform server inference into the form of batch inference.
Also, I'm wondering if there are other options that have caused the server to slow down. If there are, please let me know. I'd be really grateful!
The offline batch inference scripts is like:
The server scripts:
And the client scripts:
Beta Was this translation helpful? Give feedback.
All reactions