You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This post-training codebase supports GRPO / RL from verifiable rewards. It would be good to have an example in the repo showing how it can be used with reasoning-gym procedural datasets
This post-training codebase supports GRPO / RL from verifiable rewards. It would be good to have an example in the repo showing how it can be used with reasoning-gym procedural datasets
https://github.com/allenai/open-instruct
Existing examples (with trl, veRL, unsloth, OpenRLHF) can be used as inspiration
The text was updated successfully, but these errors were encountered: