Rework get_duplicate_forecasts() to S3 generic#1148
Rework get_duplicate_forecasts() to S3 generic#1148seabbs-bot wants to merge 14 commits intomainfrom
Conversation
Replace hard-coded column detection in get_duplicate_forecasts() with S3 dispatch. Each forecast type now defines get_duplicate_columns() returning the type-specific columns needed for duplicate detection. Plain data.frames retain the previous column-detection behaviour via the default method. Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1148 +/- ##
==========================================
- Coverage 97.98% 97.96% -0.02%
==========================================
Files 37 37
Lines 1984 2016 +32
==========================================
+ Hits 1944 1975 +31
- Misses 40 41 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Keep get_duplicate_forecasts() as a plain function instead of S3. Rename get_duplicate_columns() to get_forecast_type_ids() and make the default return character(0) so only types with actual ID columns need a method. Remove redundant methods for binary, point, and multivariate_point. Update tests to use forecast objects. Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
Make get_forecast_type_ids() public so custom forecast types can define methods. Fix get_duplicate_forecasts() description that still referenced the old column-detection default. Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
This comment was marked as outdated.
This comment was marked as outdated.
Add a `type` parameter so users can check for duplicates on raw data.frames by specifying the forecast type (e.g. "quantile", "sample"). Internally constructs a temporary forecast object to dispatch via get_forecast_type_ids(). Calling on a plain data.frame without type emits a deprecation warning. Also update check_duplicates() to include the detected type in the suggested get_duplicate_forecasts() call in the error message. Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
Keep old column-detection behaviour when calling get_duplicate_forecasts() on a plain data.frame without type, emitting a deprecation warning. Fix keyword_quote_linter by using backtick syntax for cli message names. Improve @param type docs to explain the class suffix convention. Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
Add @Keywords as_forecast so pkgdown can find the topic in the reference index. Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
Add lifecycle as a dependency and use lifecycle::deprecate_warn() for the deprecated plain data.frame path in get_duplicate_forecasts() instead of a manual cli_warn(). Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
The error message now shows a copy-pasteable get_duplicate_forecasts() call with both forecast_unit and type arguments so users can diagnose duplicates on their raw data without guessing parameters. Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
seabbs
left a comment
There was a problem hiding this comment.
This looks good to me. Its a shame to need the type but I don't think there is another pattern that really makes sense unless we make internal checks more verbose again
Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
nikosbosse
left a comment
There was a problem hiding this comment.
Looks good to me - only thing I'd do is consider adding the class instead of converting into a forecast object, but 🤷
| if (inherits(data, "forecast")) { | ||
| type_cols <- get_forecast_type_ids(data) | ||
| } else if (!is.null(type)) { | ||
| tmp <- new_forecast(data, paste0("forecast_", type)) |
There was a problem hiding this comment.
When exactly does the new forecast object get validated? Since the point of the function is to diagnose issues when forecast validation fails, it's of course important that this doesn't fail imediately.
There was a problem hiding this comment.
Thinking about it again. I think it should all work. I just have a vague tingling sense around data.table things and printing and validation.
Maybe something to monitor. Maybe cleaner to just add the class instead of actually making this a forecast object?
There was a problem hiding this comment.
doesn't the new_forecast basically just add the class vs doing anything else? I didn't want it to be too fake of a process.
The initial design did only work for validated objects and then I was like well this is pointless! Its a bit of a circular problem
There was a problem hiding this comment.
Happy to resolve this and merge or any thoughts on things to do here? Could also make an issue?
Summary
get_forecast_type_ids()that returns the type-specific columns (beyond the forecast unit) identifying a unique row. Only types with actual ID columns need a method (quantile, sample, multivariate_sample, nominal, ordinal); the default returnscharacter(0).get_duplicate_forecasts()now callsget_forecast_type_ids()instead of hard-codingc("sample_id", "quantile_level", "predicted_label")typeargument toget_duplicate_forecasts()for use on raw data.frames (e.g.type = "quantile"). Internally constructs a temporary forecast object vianew_forecast()to dispatch correctly.get_duplicate_forecasts()on a plain data.frame withouttypeemits a deprecation warningcheck_duplicates()error message now includes the detected type in the suggestedget_duplicate_forecasts()callget_forecast_type_idsmethodCloses #888
Test plan
get_forecast_type_ids()per forecast typeget_duplicate_forecasts()withtypeon raw datatypecheck_duplicates()error messageThis was opened by a bot. Please ping @seabbs for any questions.