Interactive training with reasoning-gym server #104

andreaskoepf · 2025-02-10T22:23:11Z

Vision:
Launch a training run and use cli-commands (or a web-frontend) to monitor and manipulate the reasoning-gym dataset configuration - to directly control the next batch composition, e.g. add or remove datasets from a Composite or change the difficulty: Tune configuration parameters and immediately see the response in the current training or eval run. Work towards a vision for LLM training where humans oversee the evolving training of an LLM during RL - steering the development in the desired direction.

Implementation sketch:

expose REST API (accessible via API-key)
offer endpoints to read & manipulate the active configuration & score board
allow running reasoning-gym parameter server stand-alone (for multi-proc & distributed training with central reasoning-gym server)
add client class to fetch next task & return results to reasoning-gym server
create console-client app to read & edit configuration params & monitor accuracy values and current capabilities

andreaskoepf added the planning Ideation phase / overview, needs breakdown label Feb 10, 2025

andreaskoepf self-assigned this Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interactive training with reasoning-gym server #104

Interactive training with reasoning-gym server #104

andreaskoepf commented Feb 10, 2025

Interactive training with reasoning-gym server #104

Interactive training with reasoning-gym server #104

Comments

andreaskoepf commented Feb 10, 2025