Skip to content

tests(anomaly-models): Fix unit tests around anomaly models#123

Merged
tonywu1999 merged 4 commits intodevelfrom
fix-tests
Apr 2, 2026
Merged

tests(anomaly-models): Fix unit tests around anomaly models#123
tonywu1999 merged 4 commits intodevelfrom
fix-tests

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Apr 2, 2026

Motivation and Context

This line looks to assume that if certain rows have the same quality metrics + engineered metrics, then they should be lumped together when running an isolation forest (I guess that makes sense in a way, if metrics are EXACTLY the same, that indicates those rows are not necessarily independent in the way they were identified).

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

Motivation and Context

Rows with identical/clumped quality metrics were being grouped before running the isolation-forest anomaly model, causing brittle anomaly-score unit tests and unexpected behavior. The PR fixes a minor data.table assignment bug in DIANN cleaning that ensured a variable column name is used correctly, tightens Q-value cutoff assertions in tests, adjusts synthetic anomaly-test vectors to avoid long runs of identical values, and adds an explicit test for duplicated/clumped metric behavior.

Solution summary

  • Corrected a data.table assignment so the quantification column variable is interpreted as a column name.
  • Replaced fragile row-count-based Q-value assertions with a helper that verifies rows above cutoff have Intensity == 0 and rows below cutoff account for remaining rows.
  • Adjusted synthetic anomaly test vectors to reduce long runs of identical values and added a test that explicitly checks behavior when many rows share identical/clumped metrics.

Detailed changes

  • R/clean_DIANN.R

    • In .cleanDIANNCleanAndFilterData (and related processing), ensured variable quantificationColumn is used correctly in data.table assignments by using parenthesized-column form: (quantificationColumn) := ...
    • Kept other filtering, logging, and MBR/non-MBR branching unchanged.
  • inst/tinytest/test_clean_DIANN.R

    • Added local helper expect_qvalue_cutoff(output, col, cutoff) that:
      • Asserts all rows with output[[col]] > cutoff have Intensity == 0.
      • Asserts rows with output[[col]] <= cutoff account for all remaining rows.
    • Replaced previous direct count assertions with calls to expect_qvalue_cutoff for:
      • DetectionQValue (global_qvalue_cutoff)
      • LibQValue (qvalue_cutoff)
      • LibPGQValue (pg_qvalue_cutoff)
      • GlobalQValue and GlobalPGQValue when MBR = FALSE
  • inst/tinytest/test_converters_DIANNtoMSstatsFormat.R

    • Updated expected output row counts to reflect corrected filtering/processing:
      • First DIANN converter test: expected nrow = 348
      • DIANN 2.0 test: expected nrow = 192
  • inst/tinytest/test_utils_anomaly_score.R

    • Modified synthetic quality-metric vectors to avoid long runs of identical values:
      • "Progressively higher cumulative sums": replaced rep(0.1,5) with seq(0,0.1,length.out=5) + high tail.
      • "Extreme outlier": replaced rep(0.1,19) with seq(0,0.1,length.out=19) before the extreme value.
      • "Obvious ranking": first three levels changed from identical low values to slightly increasing (0.1, 0.11, 0.12).
    • Added new Test 11 that constructs duplicated/clumped sequences across metrics and asserts that clumped high values produce lower average anomaly scores than isolated low values (explicitly exercising grouping/duplicity behavior of the anomaly model).
    • Preserved existing property-, extreme-, and reproducibility-focused anomaly tests with adjusted inputs.

Unit tests added or modified

  • test_clean_DIANN.R

    • Added expect_qvalue_cutoff helper and invoked it across global/lib/PG Q-value cutoff scenarios (including MBR = FALSE cases).
  • test_converters_DIANNtoMSstatsFormat.R

    • Updated two row-count expectations (348 and 192) to match corrected output.
  • test_utils_anomaly_score.R

    • Adjusted several synthetic vectors to avoid long identical-value runs.
    • Added Test 11 to explicitly verify anomaly-model behavior when metrics are duplicated/clumped.
    • Existing tests retained with input adjustments to improve robustness.

Coding guidelines

  • No coding guideline violations detected in the changed files; the data.table parenthesized-column assignment follows established patterns used elsewhere in the codebase.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 2, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8a78b2dc-0306-4988-9deb-7c4df11dc257

📥 Commits

Reviewing files that changed from the base of the PR and between 539229d and 1b796ca.

📒 Files selected for processing (1)
  • inst/tinytest/test_clean_DIANN.R
🚧 Files skipped from review as they are similar to previous changes (1)
  • inst/tinytest/test_clean_DIANN.R

📝 Walkthrough

Walkthrough

Fixes a data.table assignment in DIANN cleaning so a variable quantificationColumn is evaluated as a column name when zeroing intensities for rows above q-value cutoffs; updates related unit tests (q-value checks, converter row counts, anomaly-score inputs).

Changes

Cohort / File(s) Summary
DIANN Cleaning Logic
R/clean_DIANN.R
Changed a data.table assignment from dn_input[QValue >= global_qvalue_cutoff, quantificationColumn := 0] to dn_input[QValue >= global_qvalue_cutoff, (quantificationColumn) := 0] so the variable is evaluated as the target column when setting intensities to zero.
DIANN Cleaning Tests
inst/tinytest/test_clean_DIANN.R
Added local helper expect_qvalue_cutoff(output, col, cutoff) and refactored assertions to validate that rows above cutoffs have Intensity == 0; adjusted MBR-related test scenarios to use global q-value columns with updated cutoff values.
Converter Tests
inst/tinytest/test_converters_DIANNtoMSstatsFormat.R
Updated expected output row counts: first test 174 → 348 rows, DIANN 2.0 case 180 → 192 rows.
Anomaly Score Tests
inst/tinytest/test_utils_anomaly_score.R
Modified synthetic quality-metric vectors (use gradual ramps, slight increases) across several tests and added a new Test 11 that asserts relative anomaly-score averages for duplicated/clumped sequences.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • mstaniak

Poem

🐰 A parenthesis snug, a quiet fix,
Names now land where code expects.
Zeros placed where q-values soar,
Tests updated, numbers more.
🥕🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description addresses motivation and context with a specific technical reference but omits a detailed bullet-point list of changes in the Changes section and lacks specifics about which unit tests were added/modified. Expand the Changes section with a detailed bullet-point list of all modifications across test files and add specific test names/purposes in the Testing section to clarify what was added or modified.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'tests(anomaly-models): Fix unit tests around anomaly models' accurately reflects the primary change: modifications to unit tests related to anomaly model behavior, as evidenced by changes across multiple test files and adjustments to synthetic quality-metric vectors.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tonywu1999 tonywu1999 requested a review from devonjkohler April 2, 2026 16:43
@tonywu1999 tonywu1999 changed the title Fix tests tests(anomaly-models): Fix unit tests around anomaly models Apr 2, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
R/clean_DIANN.R (1)

174-178: Log message wording may cause confusion.

The log message says 'Filtering on Q.Value < 0.01' which implies rows are kept when QValue < cutoff. The code correctly zeros out intensities for rows where QValue >= cutoff, but the log phrasing could be clearer. Consider rewording to something like 'Zeroing intensity for rows with Q.Value >= 0.01' to match the actual operation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/clean_DIANN.R` around lines 174 - 178, The log message created in msg
currently reads as if rows with QValue < global_qvalue_cutoff are being kept,
but the code actually sets (quantificationColumn) to 0 for rows with QValue >=
global_qvalue_cutoff; update the wording assigned to msg (and both
getOption("MSstatsLog") and getOption("MSstatsMsg") calls) to clearly state that
intensities are being zeroed for rows with Q.Value >= global_qvalue_cutoff
(referencing variables msg, global_qvalue_cutoff, dn_input, QValue, and
quantificationColumn) so the log matches the operation performed.
inst/tinytest/test_clean_DIANN.R (1)

38-45: Second assertion is a mathematical tautology.

This assertion checks that sum(output[[col]] <= cutoff) equals nrow(output) - sum(output[[col]] > cutoff), which is always true by definition (rows either satisfy > cutoff or <= cutoff). It doesn't test any actual behavior of the filtering logic.

If the intent is to verify that rows below the cutoff retain their original (non-zero) intensity values, consider:

     expect_equal(
-        sum(output[[col]] <= cutoff),
-        nrow(output) - sum(output[[col]] > cutoff),
+        sum(output[[intensity_col]] > 0 & output[[col]] <= cutoff),
+        sum(output[[col]] <= cutoff),
         info = sprintf(
-            "Rows with %s <= %s should account for all rows not above the cutoff",
-            col, cutoff
+            "Rows with %s <= %s should retain non-zero %s",
+            col, cutoff, intensity_col
         )
     )

This would verify that rows passing the cutoff have non-zero intensity (assuming the source data has non-zero values).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@inst/tinytest/test_clean_DIANN.R` around lines 38 - 45, The current
expect_equal comparing sum(output[[col]] <= cutoff) to nrow(output) -
sum(output[[col]] > cutoff) is a tautology and should be replaced with an
assertion that actually tests the data: update the test to assert that all
intensity values for rows that pass the cutoff are non-zero, e.g. use output,
col and cutoff to check either expect_true(all(output[[col]][output[[col]] <=
cutoff] != 0)) or expect_equal(sum(output[[col]][output[[col]] <= cutoff] != 0),
sum(output[[col]] <= cutoff)) so the test verifies retained rows have non-zero
intensities.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@inst/tinytest/test_clean_DIANN.R`:
- Line 29: The function expect_qvalue_cutoff currently has intensity_col = NULL
which can mask failures because output[[NULL]] returns NULL; change the
signature of expect_qvalue_cutoff to default intensity_col to "Intensity" (i.e.,
intensity_col = "Intensity") and ensure all internal uses of intensity_col
(references to output[[intensity_col]] and any sum(...) or comparison logic)
operate on that non-NULL default so tests fail loudly when the Intensity column
is missing rather than silently passing.

---

Nitpick comments:
In `@inst/tinytest/test_clean_DIANN.R`:
- Around line 38-45: The current expect_equal comparing sum(output[[col]] <=
cutoff) to nrow(output) - sum(output[[col]] > cutoff) is a tautology and should
be replaced with an assertion that actually tests the data: update the test to
assert that all intensity values for rows that pass the cutoff are non-zero,
e.g. use output, col and cutoff to check either
expect_true(all(output[[col]][output[[col]] <= cutoff] != 0)) or
expect_equal(sum(output[[col]][output[[col]] <= cutoff] != 0), sum(output[[col]]
<= cutoff)) so the test verifies retained rows have non-zero intensities.

In `@R/clean_DIANN.R`:
- Around line 174-178: The log message created in msg currently reads as if rows
with QValue < global_qvalue_cutoff are being kept, but the code actually sets
(quantificationColumn) to 0 for rows with QValue >= global_qvalue_cutoff; update
the wording assigned to msg (and both getOption("MSstatsLog") and
getOption("MSstatsMsg") calls) to clearly state that intensities are being
zeroed for rows with Q.Value >= global_qvalue_cutoff (referencing variables msg,
global_qvalue_cutoff, dn_input, QValue, and quantificationColumn) so the log
matches the operation performed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c29c20fb-2d1e-4033-b2ec-4fce54dfc5bd

📥 Commits

Reviewing files that changed from the base of the PR and between 5360aaf and c9ea825.

📒 Files selected for processing (4)
  • R/clean_DIANN.R
  • inst/tinytest/test_clean_DIANN.R
  • inst/tinytest/test_converters_DIANNtoMSstatsFormat.R
  • inst/tinytest/test_utils_anomaly_score.R

@tonywu1999 tonywu1999 merged commit 2b96a3d into devel Apr 2, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the fix-tests branch April 2, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant