tests(anomaly-models): Fix unit tests around anomaly models by tonywu1999 · Pull Request #123 · Vitek-Lab/MSstatsConvert

tonywu1999 · 2026-04-02T16:42:49Z

Motivation and Context

This line looks to assume that if certain rows have the same quality metrics + engineered metrics, then they should be lumped together when running an isolation forest (I guess that makes sense in a way, if metrics are EXACTLY the same, that indicates those rows are not necessarily independent in the way they were identified).

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

I have read the MSstats contributing guidelines
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

Motivation and Context

Rows with identical/clumped quality metrics were being grouped before running the isolation-forest anomaly model, causing brittle anomaly-score unit tests and unexpected behavior. The PR fixes a minor data.table assignment bug in DIANN cleaning that ensured a variable column name is used correctly, tightens Q-value cutoff assertions in tests, adjusts synthetic anomaly-test vectors to avoid long runs of identical values, and adds an explicit test for duplicated/clumped metric behavior.

Solution summary

Corrected a data.table assignment so the quantification column variable is interpreted as a column name.
Replaced fragile row-count-based Q-value assertions with a helper that verifies rows above cutoff have Intensity == 0 and rows below cutoff account for remaining rows.
Adjusted synthetic anomaly test vectors to reduce long runs of identical values and added a test that explicitly checks behavior when many rows share identical/clumped metrics.

Detailed changes

R/clean_DIANN.R
- In .cleanDIANNCleanAndFilterData (and related processing), ensured variable quantificationColumn is used correctly in data.table assignments by using parenthesized-column form: (quantificationColumn) := ...
- Kept other filtering, logging, and MBR/non-MBR branching unchanged.
inst/tinytest/test_clean_DIANN.R
- Added local helper expect_qvalue_cutoff(output, col, cutoff) that:
  - Asserts all rows with output[[col]] > cutoff have Intensity == 0.
  - Asserts rows with output[[col]] <= cutoff account for all remaining rows.
- Replaced previous direct count assertions with calls to expect_qvalue_cutoff for:
  - DetectionQValue (global_qvalue_cutoff)
  - LibQValue (qvalue_cutoff)
  - LibPGQValue (pg_qvalue_cutoff)
  - GlobalQValue and GlobalPGQValue when MBR = FALSE
inst/tinytest/test_converters_DIANNtoMSstatsFormat.R
- Updated expected output row counts to reflect corrected filtering/processing:
  - First DIANN converter test: expected nrow = 348
  - DIANN 2.0 test: expected nrow = 192
inst/tinytest/test_utils_anomaly_score.R
- Modified synthetic quality-metric vectors to avoid long runs of identical values:
  - "Progressively higher cumulative sums": replaced rep(0.1,5) with seq(0,0.1,length.out=5) + high tail.
  - "Extreme outlier": replaced rep(0.1,19) with seq(0,0.1,length.out=19) before the extreme value.
  - "Obvious ranking": first three levels changed from identical low values to slightly increasing (0.1, 0.11, 0.12).
- Added new Test 11 that constructs duplicated/clumped sequences across metrics and asserts that clumped high values produce lower average anomaly scores than isolated low values (explicitly exercising grouping/duplicity behavior of the anomaly model).
- Preserved existing property-, extreme-, and reproducibility-focused anomaly tests with adjusted inputs.

Unit tests added or modified

test_clean_DIANN.R
- Added expect_qvalue_cutoff helper and invoked it across global/lib/PG Q-value cutoff scenarios (including MBR = FALSE cases).
test_converters_DIANNtoMSstatsFormat.R
- Updated two row-count expectations (348 and 192) to match corrected output.
test_utils_anomaly_score.R
- Adjusted several synthetic vectors to avoid long identical-value runs.
- Added Test 11 to explicitly verify anomaly-model behavior when metrics are duplicated/clumped.
- Existing tests retained with input adjustments to improve robustness.

Coding guidelines

No coding guideline violations detected in the changed files; the data.table parenthesized-column assignment follows established patterns used elsewhere in the codebase.

coderabbitai · 2026-04-02T16:43:05Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8a78b2dc-0306-4988-9deb-7c4df11dc257

📥 Commits

Reviewing files that changed from the base of the PR and between 539229d and 1b796ca.

📒 Files selected for processing (1)

inst/tinytest/test_clean_DIANN.R

🚧 Files skipped from review as they are similar to previous changes (1)

inst/tinytest/test_clean_DIANN.R

📝 Walkthrough

Walkthrough

Fixes a data.table assignment in DIANN cleaning so a variable quantificationColumn is evaluated as a column name when zeroing intensities for rows above q-value cutoffs; updates related unit tests (q-value checks, converter row counts, anomaly-score inputs).

Changes

Cohort / File(s)	Summary
DIANN Cleaning Logic `R/clean_DIANN.R`	Changed a `data.table` assignment from `dn_input[QValue >= global_qvalue_cutoff, quantificationColumn := 0]` to `dn_input[QValue >= global_qvalue_cutoff, (quantificationColumn) := 0]` so the variable is evaluated as the target column when setting intensities to zero.
DIANN Cleaning Tests `inst/tinytest/test_clean_DIANN.R`	Added local helper `expect_qvalue_cutoff(output, col, cutoff)` and refactored assertions to validate that rows above cutoffs have `Intensity == 0`; adjusted MBR-related test scenarios to use global q-value columns with updated cutoff values.
Converter Tests `inst/tinytest/test_converters_DIANNtoMSstatsFormat.R`	Updated expected output row counts: first test 174 → 348 rows, DIANN 2.0 case 180 → 192 rows.
Anomaly Score Tests `inst/tinytest/test_utils_anomaly_score.R`	Modified synthetic quality-metric vectors (use gradual ramps, slight increases) across several tests and added a new Test 11 that asserts relative anomaly-score averages for duplicated/clumped sequences.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

refactor(diann): move q-value filtering to MSstatsClean for DIANN #116: modifies DIANN q-value filtering and touches the same cleaning codepath adjusted here.
fix(diann): Update DIANN cleaning to be more memory and speed efficient #115: changes how quantificationColumn is referenced/handled in .cleanDIANNCleanAndFilterData (data.table variable column handling).
feat(diann): Add support for DIANN 2.0 #109: earlier PR that introduced/modified the DIANN cleaning function edited by this change.

Suggested reviewers

mstaniak

Poem

🐰 A parenthesis snug, a quiet fix,
Names now land where code expects.
Zeros placed where q-values soar,
Tests updated, numbers more.
🥕🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The description addresses motivation and context with a specific technical reference but omits a detailed bullet-point list of changes in the Changes section and lacks specifics about which unit tests were added/modified.	Expand the Changes section with a detailed bullet-point list of all modifications across test files and add specific test names/purposes in the Testing section to clarify what was added or modified.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'tests(anomaly-models): Fix unit tests around anomaly models' accurately reflects the primary change: modifications to unit tests related to anomaly model behavior, as evidenced by changes across multiple test files and adjustments to synthetic quality-metric vectors.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

R/clean_DIANN.R (1)
174-178: Log message wording may cause confusion.

The log message says 'Filtering on Q.Value < 0.01' which implies rows are kept when QValue < cutoff. The code correctly zeros out intensities for rows where QValue >= cutoff, but the log phrasing could be clearer. Consider rewording to something like 'Zeroing intensity for rows with Q.Value >= 0.01' to match the actual operation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/clean_DIANN.R` around lines 174 - 178, The log message created in msg
currently reads as if rows with QValue < global_qvalue_cutoff are being kept,
but the code actually sets (quantificationColumn) to 0 for rows with QValue >=
global_qvalue_cutoff; update the wording assigned to msg (and both
getOption("MSstatsLog") and getOption("MSstatsMsg") calls) to clearly state that
intensities are being zeroed for rows with Q.Value >= global_qvalue_cutoff
(referencing variables msg, global_qvalue_cutoff, dn_input, QValue, and
quantificationColumn) so the log matches the operation performed.
inst/tinytest/test_clean_DIANN.R (1)
38-45: Second assertion is a mathematical tautology.

This assertion checks that sum(output[[col]] <= cutoff) equals nrow(output) - sum(output[[col]] > cutoff), which is always true by definition (rows either satisfy > cutoff or <= cutoff). It doesn't test any actual behavior of the filtering logic.

If the intent is to verify that rows below the cutoff retain their original (non-zero) intensity values, consider:
     expect_equal(
-        sum(output[[col]] <= cutoff),
-        nrow(output) - sum(output[[col]] > cutoff),
+        sum(output[[intensity_col]] > 0 & output[[col]] <= cutoff),
+        sum(output[[col]] <= cutoff),
         info = sprintf(
-            "Rows with %s <= %s should account for all rows not above the cutoff",
-            col, cutoff
+            "Rows with %s <= %s should retain non-zero %s",
+            col, cutoff, intensity_col
         )
     )
This would verify that rows passing the cutoff have non-zero intensity (assuming the source data has non-zero values).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@inst/tinytest/test_clean_DIANN.R` around lines 38 - 45, The current
expect_equal comparing sum(output[[col]] <= cutoff) to nrow(output) -
sum(output[[col]] > cutoff) is a tautology and should be replaced with an
assertion that actually tests the data: update the test to assert that all
intensity values for rows that pass the cutoff are non-zero, e.g. use output,
col and cutoff to check either expect_true(all(output[[col]][output[[col]] <=
cutoff] != 0)) or expect_equal(sum(output[[col]][output[[col]] <= cutoff] != 0),
sum(output[[col]] <= cutoff)) so the test verifies retained rows have non-zero
intensities.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@inst/tinytest/test_clean_DIANN.R`:
- Line 29: The function expect_qvalue_cutoff currently has intensity_col = NULL
which can mask failures because output[[NULL]] returns NULL; change the
signature of expect_qvalue_cutoff to default intensity_col to "Intensity" (i.e.,
intensity_col = "Intensity") and ensure all internal uses of intensity_col
(references to output[[intensity_col]] and any sum(...) or comparison logic)
operate on that non-NULL default so tests fail loudly when the Intensity column
is missing rather than silently passing.

---

Nitpick comments:
In `@inst/tinytest/test_clean_DIANN.R`:
- Around line 38-45: The current expect_equal comparing sum(output[[col]] <=
cutoff) to nrow(output) - sum(output[[col]] > cutoff) is a tautology and should
be replaced with an assertion that actually tests the data: update the test to
assert that all intensity values for rows that pass the cutoff are non-zero,
e.g. use output, col and cutoff to check either
expect_true(all(output[[col]][output[[col]] <= cutoff] != 0)) or
expect_equal(sum(output[[col]][output[[col]] <= cutoff] != 0), sum(output[[col]]
<= cutoff)) so the test verifies retained rows have non-zero intensities.

In `@R/clean_DIANN.R`:
- Around line 174-178: The log message created in msg currently reads as if rows
with QValue < global_qvalue_cutoff are being kept, but the code actually sets
(quantificationColumn) to 0 for rows with QValue >= global_qvalue_cutoff; update
the wording assigned to msg (and both getOption("MSstatsLog") and
getOption("MSstatsMsg") calls) to clearly state that intensities are being
zeroed for rows with Q.Value >= global_qvalue_cutoff (referencing variables msg,
global_qvalue_cutoff, dn_input, QValue, and quantificationColumn) so the log
matches the operation performed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c29c20fb-2d1e-4033-b2ec-4fce54dfc5bd

📥 Commits

Reviewing files that changed from the base of the PR and between 5360aaf and c9ea825.

📒 Files selected for processing (4)

R/clean_DIANN.R
inst/tinytest/test_clean_DIANN.R
inst/tinytest/test_converters_DIANNtoMSstatsFormat.R
inst/tinytest/test_utils_anomaly_score.R

inst/tinytest/test_clean_DIANN.R

tonywu1999 added 2 commits April 2, 2026 12:11

tests(anomaly-scores): Fix unit tests around anomaly scores

585213a

fix unit tests

c9ea825

tonywu1999 requested a review from devonjkohler April 2, 2026 16:43

tonywu1999 changed the title ~~Fix tests~~ tests(anomaly-models): Fix unit tests around anomaly models Apr 2, 2026

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

inst/tinytest/test_clean_DIANN.R Outdated Show resolved Hide resolved

tonywu1999 added 2 commits April 2, 2026 14:26

add unit tests for duplicitiy

539229d

fix q-value tests

1b796ca

tonywu1999 merged commit 2b96a3d into devel Apr 2, 2026
2 checks passed

tonywu1999 deleted the fix-tests branch April 2, 2026 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests(anomaly-models): Fix unit tests around anomaly models#123

tests(anomaly-models): Fix unit tests around anomaly models#123
tonywu1999 merged 4 commits intodevelfrom
fix-tests

tonywu1999 commented Apr 2, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 2, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tonywu1999 commented Apr 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Testing

Checklist Before Requesting a Review

Motivation and Context

Solution summary

Detailed changes

Unit tests added or modified

Coding guidelines

Uh oh!

coderabbitai bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tonywu1999 commented Apr 2, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 2, 2026 •

edited

Loading