Add example of RL training using open-instruct #189

olliestanley · 2025-02-22T01:04:17Z

This post-training codebase supports GRPO / RL from verifiable rewards. It would be good to have an example in the repo showing how it can be used with reasoning-gym procedural datasets

https://github.com/allenai/open-instruct

Existing examples (with trl, veRL, unsloth, OpenRLHF) can be used as inspiration

andreaskoepf added good first issue Good for newcomers and removed good first issue Good for newcomers labels Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example of RL training using open-instruct #189

Add example of RL training using open-instruct #189

olliestanley commented Feb 22, 2025

Add example of RL training using open-instruct #189

Add example of RL training using open-instruct #189

Comments

olliestanley commented Feb 22, 2025