Avoid repeated and unnecessary quality checks during forecast scoring #612

nmdefries · 2023-01-17T22:31:33Z

Most of the runtime in scoring forecasts is spent in the checking functions, is_symmetric and score_func_params_check. The check functions are called by each evalcast error measure, so they're done repeatedly for each chunk of data.

Besides the checks being repeated across error measures, not every error measure needs all the checks contained in score_func_param_checker. How best to avoid unnecessary and repeated checks isn't immediately obvious.

The text was updated successfully, but these errors were encountered:

nmdefries · 2023-01-17T22:39:30Z

Potential approaches and considerations from discussion between @brookslogan and @nmdefries

Scoring is particularly slow when processing many forecasts at once, as in the forecast eval pipeline.

For forecast eval purposes, it makes sense to move the checks to the erm function before we do the loop, and drop the checks from all the error functions. This means that the checks only get run once per data group.

However, this isn't a good general solution since it makes assumptions about data structure that might not be true for custom error measure fns.

A general solution might be to use some sort of caching.

A naively memoised check would not be sufficiently reliable or performant. Attributes sometimes propagate through calculations, failing to invalidate the cache. Object addresses can remain the same while the object changes, even when only using R's "copy on write"-ish functions (due to some optimizations underneath), failing to invalidate the cache. Object hashes are not performant enough.

Could we attach check output to the data obj the first time the checks are run? The two check fns would be called on each data group, and the output (including assertion failures) attached as attributes to the chunk to effectively use as a cache the next time the checks are called. To do this, the checks would have to be run before any error measures, or the error measures would always have to pass back a potentially-modified data obj.

brookslogan · 2023-10-12T18:22:40Z

attached as attributes to the chunk

When the chunks are combined the attributes will need to be combined, and I think the way dplyr does this is by default to just use the first or last attribute, not combine. We'd either need special classes (maybe a vctrs vector implementation), or maybe store things first as a list (using either special objects in the list or attributes attached to objects in the list to indicate errors/warnings), and then manually unchop that column. Or there might be some similar helper functionality for this in purrr, maybe [something similar to] purrr:safely.

Above seems potentially easiest, but we might also think of registering checker functions for each metric & then have a step to run all unique checks required by selected metrics, then run all metrics.

nmdefries · 2023-10-13T16:46:29Z

When the chunks are combined the attributes will need to be combined

We shouldn't need to combine the data chunks themselves or preserve the check-result attributes long term.

In the old approach, each chunk (forecaster + some other stuff) is passed separately to erm. erm() sequentially evaluates all error measures on that chunk and returns only the scores, which are then combined in evaluate_predictions. Attribute-modified chunks would only be used within erm (or similar wrapper fn).

nmdefries · 2023-10-13T16:48:10Z

Our current approach evaluates all error measures at once, in a single summarize, so there's not a clear place to run up-front checks or pass modified chunks back -- perhaps we'd need to return to the old approach. Or maybe we could give summarize a wrapper that runs the checks for each chunk, then does an internal summarize(!!!error_measures) over just that chunk.

Would modifying chunks cause much-increased memory usage? I'm thinking that adding an attribute to a df (underlying implementation is a list) will trigger a shallow copy only. But we may need to be careful about this.

nmdefries added enhancement New feature or request evalcast package labels Jan 17, 2023

dshemetov mentioned this issue Oct 4, 2023

Optimize inner-loop scoring functions cmu-delphi/exploration-tooling#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid repeated and unnecessary quality checks during forecast scoring #612

Avoid repeated and unnecessary quality checks during forecast scoring #612

nmdefries commented Jan 17, 2023

nmdefries commented Jan 17, 2023

brookslogan commented Oct 12, 2023 •

edited

Loading

nmdefries commented Oct 13, 2023

nmdefries commented Oct 13, 2023

Avoid repeated and unnecessary quality checks during forecast scoring #612

Avoid repeated and unnecessary quality checks during forecast scoring #612

Comments

nmdefries commented Jan 17, 2023

nmdefries commented Jan 17, 2023

brookslogan commented Oct 12, 2023 • edited Loading

nmdefries commented Oct 13, 2023

nmdefries commented Oct 13, 2023

brookslogan commented Oct 12, 2023 •

edited

Loading