Skip to content

Rework get_duplicate_forecasts() to S3 generic#1148

Open
seabbs-bot wants to merge 14 commits intomainfrom
feat-issue-5
Open

Rework get_duplicate_forecasts() to S3 generic#1148
seabbs-bot wants to merge 14 commits intomainfrom
feat-issue-5

Conversation

@seabbs-bot
Copy link
Copy Markdown
Collaborator

@seabbs-bot seabbs-bot commented Mar 30, 2026

Summary

  • Adds exported S3 generic get_forecast_type_ids() that returns the type-specific columns (beyond the forecast unit) identifying a unique row. Only types with actual ID columns need a method (quantile, sample, multivariate_sample, nominal, ordinal); the default returns character(0).
  • get_duplicate_forecasts() now calls get_forecast_type_ids() instead of hard-coding c("sample_id", "quantile_level", "predicted_label")
  • Adds a type argument to get_duplicate_forecasts() for use on raw data.frames (e.g. type = "quantile"). Internally constructs a temporary forecast object via new_forecast() to dispatch correctly.
  • Calling get_duplicate_forecasts() on a plain data.frame without type emits a deprecation warning
  • check_duplicates() error message now includes the detected type in the suggested get_duplicate_forecasts() call
  • New forecast types only need to define a get_forecast_type_ids method

Closes #888

Test plan

  • All 742 tests pass
  • New tests for get_forecast_type_ids() per forecast type
  • New tests for get_duplicate_forecasts() with type on raw data
  • Test for deprecation warning on raw data without type
  • Test for type hint in check_duplicates() error message
  • Linter passes on all changed files

This was opened by a bot. Please ping @seabbs for any questions.

Replace hard-coded column detection in get_duplicate_forecasts() with
S3 dispatch. Each forecast type now defines get_duplicate_columns()
returning the type-specific columns needed for duplicate detection.
Plain data.frames retain the previous column-detection behaviour via
the default method.

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.96%. Comparing base (4a222da) to head (b1d82c5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1148      +/-   ##
==========================================
- Coverage   97.98%   97.96%   -0.02%     
==========================================
  Files          37       37              
  Lines        1984     2016      +32     
==========================================
+ Hits         1944     1975      +31     
- Misses         40       41       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Keep get_duplicate_forecasts() as a plain function instead of S3.
Rename get_duplicate_columns() to get_forecast_type_ids() and make
the default return character(0) so only types with actual ID columns
need a method. Remove redundant methods for binary, point, and
multivariate_point. Update tests to use forecast objects.

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
@seabbs seabbs requested review from nikosbosse, sbfnk and seabbs March 30, 2026 15:09
Make get_forecast_type_ids() public so custom forecast types can
define methods. Fix get_duplicate_forecasts() description that
still referenced the old column-detection default.

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
@seabbs seabbs removed request for nikosbosse and sbfnk March 30, 2026 15:21
@seabbs

This comment was marked as outdated.

seabbs and others added 4 commits March 30, 2026 17:23
Add a `type` parameter so users can check for duplicates on raw
data.frames by specifying the forecast type (e.g. "quantile",
"sample"). Internally constructs a temporary forecast object to
dispatch via get_forecast_type_ids(). Calling on a plain data.frame
without type emits a deprecation warning.

Also update check_duplicates() to include the detected type in the
suggested get_duplicate_forecasts() call in the error message.

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
Keep old column-detection behaviour when calling
get_duplicate_forecasts() on a plain data.frame without type,
emitting a deprecation warning. Fix keyword_quote_linter by using
backtick syntax for cli message names. Improve @param type docs
to explain the class suffix convention.

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
Add @Keywords as_forecast so pkgdown can find the topic in
the reference index.

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
seabbs and others added 3 commits March 30, 2026 19:24
Add lifecycle as a dependency and use lifecycle::deprecate_warn()
for the deprecated plain data.frame path in get_duplicate_forecasts()
instead of a manual cli_warn().

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
The error message now shows a copy-pasteable
get_duplicate_forecasts() call with both forecast_unit and type
arguments so users can diagnose duplicates on their raw data
without guessing parameters.

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
Copy link
Copy Markdown
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Its a shame to need the type but I don't think there is another pattern that really makes sense unless we make internal checks more verbose again

@seabbs seabbs requested review from nikosbosse and sbfnk March 30, 2026 18:34
Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
@seabbs seabbs enabled auto-merge March 30, 2026 18:40
Copy link
Copy Markdown
Collaborator

@nikosbosse nikosbosse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - only thing I'd do is consider adding the class instead of converting into a forecast object, but 🤷

if (inherits(data, "forecast")) {
type_cols <- get_forecast_type_ids(data)
} else if (!is.null(type)) {
tmp <- new_forecast(data, paste0("forecast_", type))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When exactly does the new forecast object get validated? Since the point of the function is to diagnose issues when forecast validation fails, it's of course important that this doesn't fail imediately.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it again. I think it should all work. I just have a vague tingling sense around data.table things and printing and validation.
Maybe something to monitor. Maybe cleaner to just add the class instead of actually making this a forecast object?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't the new_forecast basically just add the class vs doing anything else? I didn't want it to be too fake of a process.

The initial design did only work for validated objects and then I was like well this is pointless! Its a bit of a circular problem

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to resolve this and merge or any thoughts on things to do here? Could also make an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rework get_duplicate_forecasts() to S3 to avoid hard-coding columns and types

3 participants