You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Vision:
Launch a training run and use cli-commands (or a web-frontend) to monitor and manipulate the reasoning-gym dataset configuration - to directly control the next batch composition, e.g. add or remove datasets from a Composite or change the difficulty: Tune configuration parameters and immediately see the response in the current training or eval run. Work towards a vision for LLM training where humans oversee the evolving training of an LLM during RL - steering the development in the desired direction.
Implementation sketch:
expose REST API (accessible via API-key)
offer endpoints to read & manipulate the active configuration & score board
allow running reasoning-gym parameter server stand-alone (for multi-proc & distributed training with central reasoning-gym server)
add client class to fetch next task & return results to reasoning-gym server
create console-client app to read & edit configuration params & monitor accuracy values and current capabilities
The text was updated successfully, but these errors were encountered:
Vision:
Launch a training run and use cli-commands (or a web-frontend) to monitor and manipulate the reasoning-gym dataset configuration - to directly control the next batch composition, e.g. add or remove datasets from a Composite or change the difficulty: Tune configuration parameters and immediately see the response in the current training or eval run. Work towards a vision for LLM training where humans oversee the evolving training of an LLM during RL - steering the development in the desired direction.
Implementation sketch:
The text was updated successfully, but these errors were encountered: