Replies: 1 comment
-
Hey, I think that this PR should help you |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone!
I have transformer policy in an RL setting. Can VLLM accelerate sampling actions from this policy?
IIRC, VLLM is fast in consecutive generation where token after token is sampled until
<eos>
. However, in RL environments we cannot sample actions seamlessly. An action is sampled and fed to environment to get the next observation. Then, another query with the concatenated new observation is made for the next action. Can VLLM speed up sampling in this setting as well?I think it boils down to the question of whether VLLM has a faster forward pass as well compared to PyTorch model and also if the caching of prefixes between distinct queries can help me get a faster sampling (as most of the observation sequence between consecutive queries are the same)?
As a side question, does VLLM support sampling from ad-hoc PyTorch architectures?
Thank you very much for your help.
Beta Was this translation helpful? Give feedback.
All reactions