tests(anomaly-models): Fix unit tests around anomaly models#123
tests(anomaly-models): Fix unit tests around anomaly models#123tonywu1999 merged 4 commits intodevelfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughFixes a data.table assignment in DIANN cleaning so a variable Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
R/clean_DIANN.R (1)
174-178: Log message wording may cause confusion.The log message says
'Filtering on Q.Value < 0.01'which implies rows are kept whenQValue < cutoff. The code correctly zeros out intensities for rows whereQValue >= cutoff, but the log phrasing could be clearer. Consider rewording to something like'Zeroing intensity for rows with Q.Value >= 0.01'to match the actual operation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/clean_DIANN.R` around lines 174 - 178, The log message created in msg currently reads as if rows with QValue < global_qvalue_cutoff are being kept, but the code actually sets (quantificationColumn) to 0 for rows with QValue >= global_qvalue_cutoff; update the wording assigned to msg (and both getOption("MSstatsLog") and getOption("MSstatsMsg") calls) to clearly state that intensities are being zeroed for rows with Q.Value >= global_qvalue_cutoff (referencing variables msg, global_qvalue_cutoff, dn_input, QValue, and quantificationColumn) so the log matches the operation performed.inst/tinytest/test_clean_DIANN.R (1)
38-45: Second assertion is a mathematical tautology.This assertion checks that
sum(output[[col]] <= cutoff)equalsnrow(output) - sum(output[[col]] > cutoff), which is always true by definition (rows either satisfy> cutoffor<= cutoff). It doesn't test any actual behavior of the filtering logic.If the intent is to verify that rows below the cutoff retain their original (non-zero) intensity values, consider:
expect_equal( - sum(output[[col]] <= cutoff), - nrow(output) - sum(output[[col]] > cutoff), + sum(output[[intensity_col]] > 0 & output[[col]] <= cutoff), + sum(output[[col]] <= cutoff), info = sprintf( - "Rows with %s <= %s should account for all rows not above the cutoff", - col, cutoff + "Rows with %s <= %s should retain non-zero %s", + col, cutoff, intensity_col ) )This would verify that rows passing the cutoff have non-zero intensity (assuming the source data has non-zero values).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@inst/tinytest/test_clean_DIANN.R` around lines 38 - 45, The current expect_equal comparing sum(output[[col]] <= cutoff) to nrow(output) - sum(output[[col]] > cutoff) is a tautology and should be replaced with an assertion that actually tests the data: update the test to assert that all intensity values for rows that pass the cutoff are non-zero, e.g. use output, col and cutoff to check either expect_true(all(output[[col]][output[[col]] <= cutoff] != 0)) or expect_equal(sum(output[[col]][output[[col]] <= cutoff] != 0), sum(output[[col]] <= cutoff)) so the test verifies retained rows have non-zero intensities.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@inst/tinytest/test_clean_DIANN.R`:
- Line 29: The function expect_qvalue_cutoff currently has intensity_col = NULL
which can mask failures because output[[NULL]] returns NULL; change the
signature of expect_qvalue_cutoff to default intensity_col to "Intensity" (i.e.,
intensity_col = "Intensity") and ensure all internal uses of intensity_col
(references to output[[intensity_col]] and any sum(...) or comparison logic)
operate on that non-NULL default so tests fail loudly when the Intensity column
is missing rather than silently passing.
---
Nitpick comments:
In `@inst/tinytest/test_clean_DIANN.R`:
- Around line 38-45: The current expect_equal comparing sum(output[[col]] <=
cutoff) to nrow(output) - sum(output[[col]] > cutoff) is a tautology and should
be replaced with an assertion that actually tests the data: update the test to
assert that all intensity values for rows that pass the cutoff are non-zero,
e.g. use output, col and cutoff to check either
expect_true(all(output[[col]][output[[col]] <= cutoff] != 0)) or
expect_equal(sum(output[[col]][output[[col]] <= cutoff] != 0), sum(output[[col]]
<= cutoff)) so the test verifies retained rows have non-zero intensities.
In `@R/clean_DIANN.R`:
- Around line 174-178: The log message created in msg currently reads as if rows
with QValue < global_qvalue_cutoff are being kept, but the code actually sets
(quantificationColumn) to 0 for rows with QValue >= global_qvalue_cutoff; update
the wording assigned to msg (and both getOption("MSstatsLog") and
getOption("MSstatsMsg") calls) to clearly state that intensities are being
zeroed for rows with Q.Value >= global_qvalue_cutoff (referencing variables msg,
global_qvalue_cutoff, dn_input, QValue, and quantificationColumn) so the log
matches the operation performed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c29c20fb-2d1e-4033-b2ec-4fce54dfc5bd
📒 Files selected for processing (4)
R/clean_DIANN.Rinst/tinytest/test_clean_DIANN.Rinst/tinytest/test_converters_DIANNtoMSstatsFormat.Rinst/tinytest/test_utils_anomaly_score.R
Motivation and Context
This line looks to assume that if certain rows have the same quality metrics + engineered metrics, then they should be lumped together when running an isolation forest (I guess that makes sense in a way, if metrics are EXACTLY the same, that indicates those rows are not necessarily independent in the way they were identified).
Testing
Please describe any unit tests you added or modified to verify your changes.
Checklist Before Requesting a Review
Motivation and Context
Rows with identical/clumped quality metrics were being grouped before running the isolation-forest anomaly model, causing brittle anomaly-score unit tests and unexpected behavior. The PR fixes a minor data.table assignment bug in DIANN cleaning that ensured a variable column name is used correctly, tightens Q-value cutoff assertions in tests, adjusts synthetic anomaly-test vectors to avoid long runs of identical values, and adds an explicit test for duplicated/clumped metric behavior.
Solution summary
Detailed changes
R/clean_DIANN.R
inst/tinytest/test_clean_DIANN.R
inst/tinytest/test_converters_DIANNtoMSstatsFormat.R
inst/tinytest/test_utils_anomaly_score.R
Unit tests added or modified
test_clean_DIANN.R
test_converters_DIANNtoMSstatsFormat.R
test_utils_anomaly_score.R
Coding guidelines