Is the continuous batching function enabled by default in vllm? #547
-
Is the continuous batching function enabled by default in vllm? Can this feature be turned on or off selectively? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
Yes, this is enabled by default and cannot be turned off. Turning off continuous batching requires a rewrite of our system architecture, which also brings no benefit in performance. Therefore, we did not implement this. |
Beta Was this translation helpful? Give feedback.
-
Does continues batching has a way to be adjusted? I found that the latency increased significantly when using api_server than offline inference. |
Beta Was this translation helpful? Give feedback.
-
How do we control the batch size and timeout for continous batching though? |
Beta Was this translation helpful? Give feedback.
Yes, this is enabled by default and cannot be turned off. Turning off continuous batching requires a rewrite of our system architecture, which also brings no benefit in performance. Therefore, we did not implement this.