-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create example for use of reasoning-gym with OpenRLHF #10
Comments
PPO example running on local node with deepspeed can be found here: https://github.com/open-thought/reasoning-gym/tree/main/examples/OpenRLHF, wandb: https://wandb.ai/andreaskoepf/openrlhf_train_ppo |
Happy to take this |
Just realised the parent issue had already been assigned and the task is partly complete, apologies! |
@joesharratt1229 an trl example is currently still open ... could you image to work on that? Just saw this with optuna .. https://github.com/s-smits/grpo-optuna/blob/main/main.py (but we could first do the baseline...) |
btw alternatively for OpenRLHF the ray variant is also still missing, if someone wants to work on that. |
Yes happy to work on a trl example |
@joesharratt1229 fantastic, please feel free to assign yourself to the trl example issue #53 . |
Create an example configuration for OpenRLHF to train a 1B or 3B model with REINFORCE++ and the simple arithmetic reasoning-gym dataset.
The text was updated successfully, but these errors were encountered: