Skip to content

Feat turnover#186

Draft
tonywu1999 wants to merge 3 commits intodevelfrom
feat-turnover
Draft

Feat turnover#186
tonywu1999 wants to merge 3 commits intodevelfrom
feat-turnover

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

Motivation and Context

Please include relevant motivation and context of the problem along with a short summary of the solution.

Changes

Please provide a detailed bullet point list of your changes.

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • I have run the devtools::document() command after my changes and committed the added files

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c83fe273-13c4-48d8-b378-433ef847c358

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat-turnover

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

.getNonMissingFilter() can return an undefined nonmissing_filter for some paths (e.g., impute=TRUE with censored_symbol=NULL, or censored_symbol values other than 0/NA). Consider initializing a default filter and handling unexpected censored_symbol values explicitly to avoid runtime errors and inconsistent missingness logic.

.getNonMissingFilter = function(input, impute, censored_symbol) {
    if (impute) {
        if (!is.null(censored_symbol)) {
            if (censored_symbol == "0") {
                nonmissing_filter = !is.na(input$newABUNDANCE) & input$newABUNDANCE != 0
            } else if (censored_symbol == "NA") {
                nonmissing_filter = !is.na(input$newABUNDANCE)
            }
        }
    } else {
        nonmissing_filter = !is.na(input$newABUNDANCE) & input$newABUNDANCE != 0
    }
    nonmissing_filter
Behavior Change

Missing/censoring logic was expanded from Light-only to all labels (e.g., quantile cutoff calculation and input$censored assignment). This can cause Heavy (or other) label rows to be censored/imputed and also affects cutoff estimation by mixing label distributions. Validate that downstream summarization/imputation and expected turnover semantics still hold, and consider reintroducing label scoping or making it configurable if needed.

    quantiles = input[!is.na(INTENSITY) & INTENSITY > 1,
                      quantile(ABUNDANCE,
                               prob = c(0.01, 0.25, 0.5, 0.75,
                                        censored_cutoff),
                               na.rm = TRUE)]
    iqr = quantiles[4] - quantiles[2]
    multiplier = (quantiles[5] - quantiles[4]) / iqr
    cutoff_lower = (quantiles[2] - multiplier * iqr)
    input$censored = !is.na(input$INTENSITY) &
        input$ABUNDANCE < cutoff_lower
    if (cutoff_lower <= 0 & !is.null(missing_symbol) & missing_symbol == "0") {
        zero_one_filter = !is.na(input$ABUNDANCE) & input$ABUNDANCE <= 0
        input$censored = ifelse(zero_one_filter, TRUE, input$censored)
    }
    if (!is.null(missing_symbol) & missing_symbol == "NA") {
        input$censored = ifelse(is.na(input$INTENSITY), TRUE,
                                input$censored)
    }

    msg = paste('** Log2 intensities under cutoff =',
                format(cutoff_lower, digits = 5),
                ' were considered as censored missing values.')
    msg_2 = paste("** Log2 intensities =", missing_symbol, "were considered as censored missing values.")

    getOption("MSstatsMsg")("INFO", msg)
    getOption("MSstatsMsg")("INFO", msg_2)

    getOption("MSstatsLog")("INFO", msg)
    getOption("MSstatsLog")("INFO", msg_2)

} else {
    if (missing_symbol == '0') {
        input$censored = !is.na(input$INTENSITY) &
            (input$INTENSITY == 1 | input$ABUNDANCE <= 0)
    } else if (missing_symbol == 'NA') {
        input$censored = is.na(input$ABUNDANCE)
    }
Runtime Risk

The nameStandards='unlabeled' branch uses input[is.na(input$LABEL), "PeptideSequence", drop = TRUE], which is not idiomatic for data.table and may not return the expected vector (or may fail if PeptideSequence is not present in input at that point). Consider using input[is.na(LABEL), unique(PeptideSequence)] (or similar) and/or verifying PeptideSequence exists before indexing.

 if (length(standards) == 1 && standards == "unlabeled") {
    standards = unique(input[is.na(input$LABEL), "PeptideSequence", drop = TRUE])
    if (length(standards) == 0) {
        msg = "nameStandards = 'unlabeled' but no unlabeled peptides found in data."
        getOption("MSstatsLog")("ERROR", msg)
        stop(msg)
    }
}
input = .normalizeGlobalStandards(input, peptides_dict, standards)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix unlabeled standards extraction

data.table subsetting does not support drop = TRUE, and selecting "PeptideSequence"
from input is unreliable since peptide sequence typically comes from peptides_dict.
Derive unlabeled standards via PEPTIDE (and map to PeptideSequence when available)
using data.table-safe expressions to avoid runtime errors.

R/utils_normalize.R [37-47]

 if (normalization_method == "GLOBALSTANDARDS") {
-     if (length(standards) == 1 && standards == "unlabeled") {
-        standards = unique(input[is.na(input$LABEL), "PeptideSequence", drop = TRUE])
+    if (length(standards) == 1 && standards == "unlabeled") {
+        unlabeled_peptides = unique(input[is.na(LABEL), PEPTIDE])
+        if (!is.null(peptides_dict) && length(unlabeled_peptides) > 0) {
+            standards = unique(peptides_dict[PEPTIDE %in% unlabeled_peptides, PeptideSequence])
+        } else {
+            standards = unlabeled_peptides
+        }
+        standards = standards[!is.na(standards)]
+
         if (length(standards) == 0) {
             msg = "nameStandards = 'unlabeled' but no unlabeled peptides found in data."
             getOption("MSstatsLog")("ERROR", msg)
             stop(msg)
         }
     }
     input = .normalizeGlobalStandards(input, peptides_dict, standards)
 }
Suggestion importance[1-10]: 8

__

Why: unique(input[is.na(input$LABEL), "PeptideSequence", drop = TRUE]) is not valid data.table syntax (drop is an unused argument) and will likely error at runtime; additionally input may not contain PeptideSequence prior to merging with peptides_dict. The suggested approach avoids the data.table subsetting bug and uses columns that are more reliably present (e.g., PEPTIDE), improving functional correctness.

Medium
Prevent NA in group detection

The %in% c("0", NA) check can propagate NA (so heavy_has_real_group_prof becomes NA)
and it is also brittle if GROUP is numeric 0 instead of "0". Make the logic
explicitly handle NA and normalize GROUP to character (or compare against both 0 and
"0"), so the subsequent if (...) never receives an NA.

R/dataProcessPlots.R [264-269]

 has_heavy_prof = any(processed$LABEL == "H")
-heavy_has_real_group_prof = has_heavy_prof && !all(processed[LABEL == "H", GROUP] %in% c("0", NA))
+heavy_groups_prof = processed[LABEL == "H", GROUP]
+heavy_has_real_group_prof =
+  has_heavy_prof &&
+  any(!is.na(heavy_groups_prof) & !(as.character(heavy_groups_prof) %in% c("0")))
+
 tempGroupName = unique(processed[, c("GROUP", "RUN")])
 if (length(unique(processed$LABEL)) == 2 && !heavy_has_real_group_prof) {
-  tempGroupName = tempGroupName[GROUP != '0']
+  tempGroupName = tempGroupName[as.character(GROUP) != "0"]
 }
Suggestion importance[1-10]: 7

__

Why: The current %in% c("0", NA) can yield NA, which can propagate through all()/! and make the if (...) condition NA, causing missing value where TRUE/FALSE needed. The proposed rewrite makes the heavy-group detection robust to NA and to GROUP being numeric vs character, improving correctness in edge cases.

Medium
Stabilize factor label mapping

factor(LABEL, labels=...) depends on the current (often alphabetical) ordering of
factor levels, which can silently swap "Heavy/Light" or "Reference/Endogenous".
Specify levels= explicitly (e.g., c("H","L")) to guarantee correct label mapping
regardless of existing factor levels.

R/dataProcessPlots.R [294-303]

 if (length(unique(processed$LABEL)) == 2) {
   if (heavy_has_real_group_prof) {
-    processed[, LABEL := factor(LABEL, labels = c("Heavy", "Light"))]
+    processed[, LABEL := factor(LABEL, levels = c("H", "L"), labels = c("Heavy", "Light"))]
   } else {
-    processed[, LABEL := factor(LABEL, labels = c("Reference", "Endogenous"))]
+    processed[, LABEL := factor(LABEL, levels = c("H", "L"), labels = c("Reference", "Endogenous"))]
   }
 } else {
   if (unique(processed$LABEL) == "L") {
-    processed[, LABEL := factor(LABEL, labels = c("Endogenous"))]
+    processed[, LABEL := factor(LABEL, levels = c("L"), labels = c("Endogenous"))]
   } else {
Suggestion importance[1-10]: 6

__

Why: Using factor(LABEL, labels = ...) without specifying levels= can mis-map labels if LABEL is already a factor with unexpected level ordering. Explicit levels = c("H","L") (and c("L") for single-label) makes the mapping deterministic and prevents subtle plot-label swaps.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant