Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create example for use of reasoning-gym with OpenRLHF #10

Open
1 of 2 tasks
andreaskoepf opened this issue Jan 25, 2025 · 7 comments
Open
1 of 2 tasks

Create example for use of reasoning-gym with OpenRLHF #10

andreaskoepf opened this issue Jan 25, 2025 · 7 comments

Comments

@andreaskoepf
Copy link
Contributor

andreaskoepf commented Jan 25, 2025

Create an example configuration for OpenRLHF to train a 1B or 3B model with REINFORCE++ and the simple arithmetic reasoning-gym dataset.

  • locally running example with transformers generate inference
  • ray based variant
@andreaskoepf
Copy link
Contributor Author

PPO example running on local node with deepspeed can be found here: https://github.com/open-thought/reasoning-gym/tree/main/examples/OpenRLHF, wandb: https://wandb.ai/andreaskoepf/openrlhf_train_ppo

@joesharratt1229
Copy link
Collaborator

Happy to take this

@joesharratt1229
Copy link
Collaborator

Just realised the parent issue had already been assigned and the task is partly complete, apologies!

@andreaskoepf
Copy link
Contributor Author

@joesharratt1229 an trl example is currently still open ... could you image to work on that? Just saw this with optuna .. https://github.com/s-smits/grpo-optuna/blob/main/main.py (but we could first do the baseline...)

@andreaskoepf
Copy link
Contributor Author

andreaskoepf commented Feb 3, 2025

btw alternatively for OpenRLHF the ray variant is also still missing, if someone wants to work on that.

@joesharratt1229
Copy link
Collaborator

Yes happy to work on a trl example

@andreaskoepf
Copy link
Contributor Author

@joesharratt1229 fantastic, please feel free to assign yourself to the trl example issue #53 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants