Using VLLM for sampling an RL Policy #11026

MiladInk · 2024-12-09T17:23:28Z

MiladInk
Dec 9, 2024

Hi everyone!

I have transformer policy in an RL setting. Can VLLM accelerate sampling actions from this policy?

IIRC, VLLM is fast in consecutive generation where token after token is sampled until <eos>. However, in RL environments we cannot sample actions seamlessly. An action is sampled and fed to environment to get the next observation. Then, another query with the concatenated new observation is made for the next action. Can VLLM speed up sampling in this setting as well?

I think it boils down to the question of whether VLLM has a faster forward pass as well compared to PyTorch model and also if the caching of prefixes between distinct queries can help me get a faster sampling (as most of the observation sequence between consecutive queries are the same)?

As a side question, does VLLM support sampling from ad-hoc PyTorch architectures?

Thank you very much for your help.

samsja · 2025-01-31T01:56:21Z

samsja
Jan 31, 2025

Hey, I think that this PR should help you

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using VLLM for sampling an RL Policy #11026

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Using VLLM for sampling an RL Policy #11026

MiladInk Dec 9, 2024

Replies: 1 comment

samsja Jan 31, 2025

MiladInk
Dec 9, 2024

samsja
Jan 31, 2025