-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-Curriculum: Adaptively adjusting the complexity of tasks #17
Comments
Adding an abstract method to the
Only question is, if this is applicable to every possible procedural dataset? (possibly) |
We could for example add a method to A specific implementation would need to be created for each dataset. But it would be a clean way to get from a scalar value to a dataset config object. I suggest we start with the impl for something simple as ChainSum. @andthattoo let me know if you have time to give it a shot. :-) |
Post-implementation https://github.com/open-thought/reasoning-gym/discussions/27 |
ok, we continue the planning under discussions: Curriculum Crafting #27 |
The idea of an auto-curriculum is to optimize the learning signal by adjusting the difficulty of tasks dependent on the model capabilities.
Training tasks should not be too hard nor too easy, e.g. see concepts from psychology like zone of proximal development:
In a naive setup with a dataset that contains problems of all levels of difficulty RL will in the beginning exposed to many tasks that it cannot solve while at a later stage it might ace many of the simpler once which then also doesn't provide a new information anymore.
These parts are needed:
For task difficulty adjustment we could add another (abstract) method to the
ProceduralDataset
base class which could be implemented in a task/dataset dependent form in the derived classes.An extended form of the curriculum decorator should be able to adjust the difficulty of a dataset-collection, potentially also adjusting the frequency at dataset level for batch sampling.
The text was updated successfully, but these errors were encountered: