Create example for use of reasoning-gym with OpenRLHF #10

andreaskoepf · 2025-01-25T23:49:38Z

Create an example configuration for OpenRLHF to train a 1B or 3B model with REINFORCE++ and the simple arithmetic reasoning-gym dataset.

andreaskoepf · 2025-01-29T07:11:40Z

joesharratt1229 · 2025-02-03T07:57:50Z

Happy to take this

joesharratt1229 · 2025-02-03T09:22:04Z

Just realised the parent issue had already been assigned and the task is partly complete, apologies!

andreaskoepf · 2025-02-03T12:06:34Z

@joesharratt1229 an trl example is currently still open ... could you image to work on that? Just saw this with optuna .. https://github.com/s-smits/grpo-optuna/blob/main/main.py (but we could first do the baseline...)

andreaskoepf · 2025-02-03T12:09:11Z

btw alternatively for OpenRLHF the ray variant is also still missing, if someone wants to work on that.

joesharratt1229 · 2025-02-03T13:57:04Z

Yes happy to work on a trl example

andreaskoepf · 2025-02-03T14:43:26Z

@joesharratt1229 fantastic, please feel free to assign yourself to the trl example issue #53 .

Provide feedback