-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid repeated and unnecessary quality checks during forecast scoring #612
Comments
Potential approaches and considerations from discussion between @brookslogan and @nmdefries Scoring is particularly slow when processing many forecasts at once, as in the forecast eval pipeline. For forecast eval purposes, it makes sense to move the checks to the However, this isn't a good general solution since it makes assumptions about data structure that might not be true for custom error measure fns. A general solution might be to use some sort of caching. A naively Could we attach check output to the data obj the first time the checks are run? The two check fns would be called on each data group, and the output (including assertion failures) attached as attributes to the chunk to effectively use as a cache the next time the checks are called. To do this, the checks would have to be run before any error measures, or the error measures would always have to pass back a potentially-modified data obj. |
When the chunks are combined the attributes will need to be combined, and I think the way dplyr does this is by default to just use the first or last attribute, not combine. We'd either need special classes (maybe a vctrs vector implementation), or maybe store things first as a list (using either special objects in the list or attributes attached to objects in the list to indicate errors/warnings), and then manually unchop that column. Or there might be some similar helper functionality for this in Above seems potentially easiest, but we might also think of registering checker functions for each metric & then have a step to run all unique checks required by selected metrics, then run all metrics. |
We shouldn't need to combine the data chunks themselves or preserve the check-result attributes long term. In the old approach, each chunk (forecaster + some other stuff) is passed separately to |
Our current approach evaluates all error measures at once, in a single Would modifying chunks cause much-increased memory usage? I'm thinking that adding an attribute to a df (underlying implementation is a list) will trigger a shallow copy only. But we may need to be careful about this. |
Most of the runtime in scoring forecasts is spent in the checking functions,
is_symmetric
andscore_func_params_check
. The check functions are called by each evalcast error measure, so they're done repeatedly for each chunk of data.Besides the checks being repeated across error measures, not every error measure needs all the checks contained in
score_func_param_checker
. How best to avoid unnecessary and repeated checks isn't immediately obvious.The text was updated successfully, but these errors were encountered: