Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added 7 tasks for calendar arithmetic.
For prompt diversity, each has 3 question templates and they share different format requests selected at random (eg: specify if the expected answer is a day in letters or a numerical value).
The calendar arithmetic problems (generally) span a year, specified in the config. The tasks have different inherent complexities, with three parameters that can be tuned to adjust the difficulty:
Tunable:
"is_leap_year" - tuned with
leap_year_range
The next four tasks difficulty depend on
offset_upper_bound
. They can be easy to hard, by setting the offset bound from a few days to an arbitrary large number."weekday_offset"
"count_business_days"
"count_days"
"weekday_of_date_from_first_day" (offset is capped to fit in a year)
Not tunable (and hard, especially if given year is not in the training distribution of the llm):
"weekday_of_date"
"recurring_event_day" (harder)
I also created a denser reward for numerical answers and for strings (eg: wrong capitalization shouldn't have 0 reward).