original implementation of spectronaut converter updates#122
original implementation of spectronaut converter updates#122tonywu1999 merged 14 commits intodevelfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughExtended Spectronaut-to-MSstats conversion: added Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Converter as SpectronauttoMSstatsFormat()
participant Cleaner as .cleanRawSpectronaut()
participant Validator as .validateSpectronautInput()
participant ColumnStd as .addSpectronautColumnsIfMissing()
participant IntResolver as .resolveSpectronautIntensityColumn()
participant IsoAssigner as .assignSpectronautIsotopeLabelType()
participant Input as Spectronaut Input
User->>Converter: call(intensity, peptideSequenceColumn, heavyLabels)
Converter->>Cleaner: forward params + data
Cleaner->>Validator: check required cols & peptideSequenceColumn
Validator-->>Cleaner: validation result
Cleaner->>ColumnStd: synthesize missing Spectronaut cols
ColumnStd-->>Cleaner: augmented data
Cleaner->>IntResolver: map/resolve intensity column
IntResolver-->>Cleaner: resolved intensity column name
Cleaner->>IsoAssigner: assign IsotopeLabelType (if heavyLabels)
IsoAssigner-->>Cleaner: labeled data
Cleaner-->>Converter: cleaned data
Converter-->>User: MSstats-formatted output
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
inst/tinytest/test_spectronaut_protein_turnover.R (1)
110-116: This check doesn't prove raw-column resolution picked the same intensity.Comparing only
nrow()will still pass if"FG.MS1Quantity"resolves to the wrong numeric column but preserves filtering. Compare the alignedIntensityvalues againstoutput_ms1instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@inst/tinytest/test_spectronaut_protein_turnover.R` around lines 110 - 116, The test currently only checks row counts, which doesn't verify that the "FG.MS1Quantity" column produced the same numeric intensities as the alias; update the assertion to compare the Intensity values between output_raw_col and output_ms1 instead of nrow(). Specifically, align/order both data.frames by the same key columns (e.g. Run/Precursor/Charge/Peptide or the keys used in SpectronauttoMSstatsFormat) or perform a join on those unique identifiers, then compare the resulting Intensity vectors (from output_raw_col$Intensity and output_ms1$Intensity) with expect_equal (or expect_equal with a small tolerance if numeric rounding is possible). Ensure you reference the SpectronauttoMSstatsFormat call that created output_raw_col and the expected output_ms1 in the updated assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/clean_Spectronaut.R`:
- Around line 46-54: The subset of spec_input currently builds cols then
intersects with colnames(spec_input), which unintentionally drops the
standardized labelColumn and PEPStrippedSequence when heavyLabel is enabled;
update the cols construction used before spec_input = spec_input[, cols, with =
FALSE] (and the similar projections at the other affected sites) to always
include labelColumn and "PEPStrippedSequence" (in addition to
anomalyModelFeatures when calculateAnomalyScores is TRUE) so the heavy-label
classifier functions correctly, and ensure downstream classification logic uses
labelColumn rather than PeptideSequence when heavyLabel is TRUE.
- Around line 238-256: The current validation in clean_Spectronaut.R relaxes
required columns too far; update the checks so that required_columns includes
core fields RFileName and RCondition (in addition to FGCharge) and add a
separate check that at least one of EGModifiedSequence or FGLabeledSequence is
present; do not remove or bypass validation for fields other than the four that
.addMissingSpectronautColumns synthesizes (FFrgLossType,
FExcludedFromQuantification, FFrgIon, FCharge). Locate the
required_columns/missing_columns logic and the protein-name check (symbols
required_columns, missing_columns, setdiff, getOption("MSstatsLog")) and modify
it to enforce RFileName and RCondition and the
EGModifiedSequence|FGLabeledSequence existence before proceeding.
In `@R/converters_SpectronauttoMSstatsFormat.R`:
- Around line 75-78: The current branch silently accepts a vector for the
intensity argument; instead change the logic in the SpectronauttoMSstatsFormat
converter to treat an omitted argument separately (use missing(intensity) to
apply the default) and reject explicit multi-value inputs: if missing(intensity)
keep the current default-selection behavior, otherwise require length(intensity)
== 1 and call stop() with a clear error if length > 1 so callers cannot
accidentally pass multiple channels.
---
Nitpick comments:
In `@inst/tinytest/test_spectronaut_protein_turnover.R`:
- Around line 110-116: The test currently only checks row counts, which doesn't
verify that the "FG.MS1Quantity" column produced the same numeric intensities as
the alias; update the assertion to compare the Intensity values between
output_raw_col and output_ms1 instead of nrow(). Specifically, align/order both
data.frames by the same key columns (e.g. Run/Precursor/Charge/Peptide or the
keys used in SpectronauttoMSstatsFormat) or perform a join on those unique
identifiers, then compare the resulting Intensity vectors (from
output_raw_col$Intensity and output_ms1$Intensity) with expect_equal (or
expect_equal with a small tolerance if numeric rounding is possible). Ensure you
reference the SpectronauttoMSstatsFormat call that created output_raw_col and
the expected output_ms1 in the updated assertion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1ec6a1bb-a0f5-4f05-9f1e-acc710829075
⛔ Files ignored due to path filters (2)
inst/tinytest/raw_data/Spectronaut/boxcar_protein_turnover_input.csvis excluded by!**/*.csvinst/tinytest/raw_data/Spectronaut/run_order.csvis excluded by!**/*.csv
📒 Files selected for processing (5)
R/clean_Spectronaut.RR/converters_SpectronauttoMSstatsFormat.Rinst/tinytest/test_clean_Spectronaut.Rinst/tinytest/test_converters_SpectronauttoMSstatsFormat.Rinst/tinytest/test_spectronaut_protein_turnover.R
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
R/clean_Spectronaut.R (1)
62-64:⚠️ Potential issue | 🟠 MajorKeep
RFileNamein the required-column set.
RFileNameis still renamed toRunon Line 43, so dropping it fromrequired_columnsjust lets malformed inputs pass boundary validation and fail later during annotation/preprocess with a less actionable error.Suggested fix
- required_columns = c("FGCharge") + required_columns = c("FGCharge", "RFileName")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/clean_Spectronaut.R` around lines 62 - 64, The required-columns check currently only lists "FGCharge" so malformed inputs missing the original filename column slip through; add "RFileName" to the required_columns vector (the same name that is later renamed to "Run" when processing spec_input) so missing RFileName is caught early by the missing_columns check and reported before downstream annotation/preprocess (update the required_columns definition used when computing missing_columns on spec_input).R/converters_SpectronauttoMSstatsFormat.R (1)
56-57:⚠️ Potential issue | 🔴 CriticalDefault
intensitystill reaches cleaning as a length-2 vector.Omitting
intensityleavesc("PeakArea", "NormalizedPeakArea")intact here, and the commented-out guard means that vector is forwarded unchanged into.resolveSpectronautIntensityColumn(), which expects a scalar. The default/example call therefore errors before cleaning starts.Suggested fix
SpectronauttoMSstatsFormat = function( input, annotation = NULL, - intensity = c('PeakArea', 'NormalizedPeakArea'), + intensity = "PeakArea", peptideSequenceColumn = "EG.ModifiedSequence", heavyLabels = NULL, @@ ) { - # # Standardize the intensity value when it is a raw column name so that the - # # legacy match.arg() inside .cleanRawSpectronaut sees it correctly. - # known_aliases = c('PeakArea', 'NormalizedPeakArea') - # if (length(intensity) > 1) { - # # No value supplied: use first (default) - # intensity = intensity[1] - # } - # if (!(intensity %in% known_aliases)) { - # # Treat as a raw column name and standardize it - # intensity = .standardizeColnames(intensity) - # } + if (length(intensity) != 1L) { + stop("'intensity' must be a single value.") + }Also applies to: 71-81, 119-123
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/clean_Spectronaut.R`:
- Around line 49-50: The call to .assignSpectronautIsotopeLabelType is passing
msstats_object (fourth arg) but the function signature accepts only three
params, causing an error and preventing the early return when heavyLabels is
NULL; remove the extra msstats_object argument from the calls that pass
(spec_input, heavyLabels, peptideSequenceColumn) and add or ensure an early
return inside .assignSpectronautIsotopeLabelType when heavyLabels is NULL so it
exits before pattern logic runs; also fix the fcase usage inside
.assignSpectronautIsotopeLabelType by either defining heavy_pattern_escaped and
bare_pattern from heavyLabels (escaping regex) before they are referenced or by
rewriting the fcase branches to not reference those undefined symbols; apply the
same fixes to the duplicate block around the other occurrence (lines ~189-207)
where .assignSpectronautIsotopeLabelType is invoked or similar pattern logic
exists.
- Around line 79-83: The presence check for the peptide column is inverted: in
the block using .standardizeColnames(peptideSequenceColumn) and
colnames(spec_input) you currently stop when the column exists; change the
predicate to trigger only when the standardized peptideSequenceColumn is NOT in
colnames(spec_input). Update the error path (getOption("MSstatsLog")("ERROR",
msg) and stop(msg)) to run under that negated condition and keep the existing
message (optionally fix grammar to "column is missing" or list missing columns)
so valid Spectronaut inputs are not rejected by clean_Spectronaut.R.
---
Duplicate comments:
In `@R/clean_Spectronaut.R`:
- Around line 62-64: The required-columns check currently only lists "FGCharge"
so malformed inputs missing the original filename column slip through; add
"RFileName" to the required_columns vector (the same name that is later renamed
to "Run" when processing spec_input) so missing RFileName is caught early by the
missing_columns check and reported before downstream annotation/preprocess
(update the required_columns definition used when computing missing_columns on
spec_input).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 18507982-2d6a-43a6-b051-230568d6ff63
📒 Files selected for processing (7)
R/clean_Spectronaut.RR/converters_SpectronauttoMSstatsFormat.Rman/DIANNtoMSstatsFormat.Rdman/MSstatsClean.Rdman/SpectronauttoMSstatsFormat.Rdman/dot-cleanRawDIANN.Rdman/dot-cleanRawSpectronaut.Rd
✅ Files skipped from review due to trivial changes (1)
- man/DIANNtoMSstatsFormat.Rd
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
R/clean_Spectronaut.R (1)
49-51:⚠️ Potential issue | 🔴 CriticalFunction call passes 4 arguments but
.assignSpectronautIsotopeLabelTypeonly accepts 3.The call includes
msstats_objectas the 4th argument, but the function signature at lines 189-190 only defines 3 parameters (spec_input,heavyLabels,peptideSequenceColumn). This will cause an "unused argument" error at runtime whenheavyLabelsis non-NULL.Remove the extra
msstats_objectargument:spec_input = .assignSpectronautIsotopeLabelType( - spec_input, heavyLabels, peptideSequenceColumn, msstats_object) + spec_input, heavyLabels, peptideSequenceColumn)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/clean_Spectronaut.R` around lines 49 - 51, The call to .assignSpectronautIsotopeLabelType passes a fourth argument msstats_object while the function .assignSpectronautIsotopeLabelType only accepts three parameters (spec_input, heavyLabels, peptideSequenceColumn); remove the extra msstats_object argument from the call where spec_input is assigned (the invocation at spec_input = .assignSpectronautIsotopeLabelType(...)) so the call matches the function signature and avoids the "unused argument" runtime error.
🧹 Nitpick comments (1)
R/clean_Spectronaut.R (1)
152-156: Unnecessarycollapseargument inpaste0.The
collapseparameter only affects vectors being collapsed into a single string. Here it's applied to a single concatenated string, making it dead code.if (!(resolved %in% available_cols)) { stop(paste0( - "Intensity column '", intensity, "' not found in input data. ", collapse = ", ") + "Intensity column '", intensity, "' not found in input data.") ) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/clean_Spectronaut.R` around lines 152 - 156, The stop() call inside the check for resolved in available_cols is using paste0(..., collapse = ",") which is unnecessary; remove the collapse argument from the paste0 invocation (or switch to paste if you intended to collapse a vector) so the stop message is created simply via paste0("Intensity column '", intensity, "' not found in input data. "); update the call around the variables resolved, available_cols, and intensity accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/clean_Spectronaut.R`:
- Around line 79-84: The error message built when peptideSequenceColumn is
missing uses incorrect grammar; update the string in the check that uses
.standardizeColnames(peptideSequenceColumn) %in% colnames(spec_input) so that
msg reads "The following column is missing from the input data: " (singular "is
missing") instead of "are missing"; ensure the same corrected msg is passed to
getOption("MSstatsLog")("ERROR", msg) and stop(msg).
- Around line 179-190: The function .assignSpectronautIsotopeLabelType currently
documents a 4th parameter msstats_object and accepts an unused parameter
peptideSequenceColumn while the body uses the renamed PeptideSequence column
directly; remove the unused argument peptideSequenceColumn from the signature
and from any calls, update the roxygen block to remove msstats_object so the
params match the function, and ensure the function still references
PeptideSequence and sets/updates IsotopeLabelType as before; also update any
call sites that passed peptideSequenceColumn (e.g., the callers that invoked
.assignSpectronautIsotopeLabelType) to call the new 2-parameter signature
(spec_input, heavyLabels).
- Around line 196-208: The light/heavy detection fails because bare_amino_acids
is taken from heavyLabels without removing digits or brackets; change the
extraction to strip brackets and trailing digits so bare_amino_acids is just the
alphabetic residue name (e.g. replace sub("\\[.*\\]", "", heavyLabels) with
something like sub("\\d+$", "", gsub("^\\[|\\]$", "", heavyLabels))). Keep
heavy_brackets_escaped_pattern built from the original heavyLabels (escaped for
[]), build bare_amino_acids_pattern from the cleaned alphabetic names, and leave
the spec_input[, IsotopeLabelType := data.table::fcase(...)] logic intact so
grepl(heavy_brackets_escaped_pattern, PeptideSequence, perl=TRUE) finds
bracketed labels and grepl(bare_amino_acids_pattern, PeptideSequence, perl=TRUE)
finds unlabeled residues.
---
Duplicate comments:
In `@R/clean_Spectronaut.R`:
- Around line 49-51: The call to .assignSpectronautIsotopeLabelType passes a
fourth argument msstats_object while the function
.assignSpectronautIsotopeLabelType only accepts three parameters (spec_input,
heavyLabels, peptideSequenceColumn); remove the extra msstats_object argument
from the call where spec_input is assigned (the invocation at spec_input =
.assignSpectronautIsotopeLabelType(...)) so the call matches the function
signature and avoids the "unused argument" runtime error.
---
Nitpick comments:
In `@R/clean_Spectronaut.R`:
- Around line 152-156: The stop() call inside the check for resolved in
available_cols is using paste0(..., collapse = ",") which is unnecessary; remove
the collapse argument from the paste0 invocation (or switch to paste if you
intended to collapse a vector) so the stop message is created simply via
paste0("Intensity column '", intensity, "' not found in input data. "); update
the call around the variables resolved, available_cols, and intensity
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 10170c54-3dbe-489b-9b3c-b224e81d885c
📒 Files selected for processing (3)
R/clean_Spectronaut.Rinst/tinytest/test_clean_Spectronaut.Rinst/tinytest/test_converters_SpectronauttoMSstatsFormat.R
✅ Files skipped from review due to trivial changes (1)
- inst/tinytest/test_clean_Spectronaut.R
🚧 Files skipped from review as they are similar to previous changes (1)
- inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (3)
R/clean_Spectronaut.R (2)
161-177:⚠️ Potential issue | 🟠 MajorDocumentation and implementation mismatch on
heavyLabelsformat.The roxygen documentation (lines 168-174, 180-181) describes
heavyLabelsas the label name alone (e.g.,"Lys6"), but the implementation at line 194 usessub("\\[.*\\]", "", heavyLabels)which expects a format like"K[Lys6]"(amino acid letter + bracketed label). The test file confirms the bracketed format is expected (heavyLabel = c("K[Lys6]")).With input
"Lys6"(as documented),bare_amino_acidsbecomes"Lys6"(no change), and the light classification logic breaks because it looks for literal "Lys6" in sequences instead of "K".Update the documentation to match the actual expected format:
📝 Proposed documentation fix
-#' When \code{heavyLabel} is provided, each row is classified as heavy +#' When \code{heavyLabels} is provided, each row is classified as heavy #' (\code{"H"}), light (\code{"L"}), or unlabeled (\code{NA}) by inspecting #' the labeled sequence column for the presence of the label tag. #' #' In Spectronaut protein turnover reports, heavy peptides appear in #' \code{FG.LabeledSequence} with a bracketed modification, e.g. -#' \code{_PEPTIDEK[Lys6]_}. Any sequence that contains -#' \code{[<heavyLabel>]} is classified as heavy; all others are light. -#' Sequences that do not have amino acids that can carry the label -#' are classified as \code{NA}. For example, if \code{heavyLabels} is -#' \code{"Lys6"}, then \code{PEPTIDEZ} is classified as NA since it -#' has no lysine residues that could be labeled. +#' \code{_PEPTIDEK[Lys6]_}. The \code{heavyLabels} parameter should be +#' specified as the amino acid letter followed by the bracketed label, +#' e.g. \code{"K[Lys6]"}. Sequences containing the bracketed label are +#' classified as heavy; sequences containing the bare amino acid (e.g. "K") +#' without the label are light; sequences lacking the amino acid entirely +#' are classified as \code{NA}.At lines 180-181:
-#' `@param` heavyLabels Character scalar heavy label name (e.g. \code{"Lys6"}), +#' `@param` heavyLabels Character vector of heavy labels in format amino acid + bracketed +#' modification (e.g. \code{"K[Lys6]"}, \code{"R[Arg10]"}), #' or \code{NULL}.Also applies to: 194-206
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/clean_Spectronaut.R` around lines 161 - 177, The docs and implementation disagree about heavyLabels format: the code in clean_Spectronaut.R (logic around heavyLabels, sub("\\[.*\\]", "", heavyLabels), and FG.LabeledSequence classification) expects labels like "K[Lys6]" (amino-acid plus bracketed label) but the roxygen text claims just "Lys6"; update the documentation to specify that heavyLabels must include the residue letter with the bracketed modification (e.g., "K[Lys6]") and note that the function strips the bracketed part to determine the residue (used by the IsotopeLabelType assignment and light/heavy classification), or alternatively change the implementation to accept bare labels by mapping residue letters from a provided label-to-residue mapping if you want to keep "Lys6" syntax—adjust the roxygen examples and parameter description for heavyLabels and any examples/tests to match the chosen format.
79-84:⚠️ Potential issue | 🟡 MinorMinor grammar issue in error message.
"The following column are missing" should be "The following column is missing" for proper subject-verb agreement.
✏️ Proposed fix
if (!(.standardizeColnames(peptideSequenceColumn) %in% colnames(spec_input))) { - msg = paste("The following column are missing from the input data:", + msg = paste("The following column is missing from the input data:", peptideSequenceColumn)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/clean_Spectronaut.R` around lines 79 - 84, Update the error message string in the check that verifies peptideSequenceColumn exists in spec_input: the code block using .standardizeColnames(peptideSequenceColumn) and getOption("MSstatsLog") currently builds msg with "The following column are missing from the input data:" — change this to "The following column is missing from the input data:" (or pluralize based on context if you intentionally expect multiple columns) so the subject-verb agreement is correct when stop(msg) is called.R/converters_SpectronauttoMSstatsFormat.R (1)
54-58:⚠️ Potential issue | 🟡 MinorMulti-value
intensityinput still silently uses first element.While the default is now a single value, if a caller accidentally passes
intensity = c("PeakArea", "NormalizedPeakArea"), R's vectorized behavior will silently proceed with only the first element. The past review suggested usingmissing(intensity)for the omitted-argument path and explicitly requiring a scalar.🛡️ Proposed validation in function body after line 70
) { + if (length(intensity) != 1L) { + stop("'intensity' must be a single value.") + } validation_config = list(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/converters_SpectronauttoMSstatsFormat.R` around lines 54 - 58, The function SpectronauttoMSstatsFormat currently accepts a vector for the intensity argument and will silently use only the first element; add an explicit validation near the start of the function (after the initial argument parsing, e.g., after line ~70) that uses missing(intensity) to detect omitted argument and otherwise checks that length(intensity) == 1 and is.character/introspect the type, and if not, stop() with a clear error message instructing callers to pass a single intensity name (e.g., "PeakArea"); reference the parameter intensity and function SpectronauttoMSstatsFormat when implementing this guard.
🧹 Nitpick comments (1)
R/clean_Spectronaut.R (1)
152-156: Unnecessarycollapseargument inpaste0().The
collapseparameter inpaste0()at line 154 is ineffective here since there's only a single string being constructed. It doesn't cause an error but is misleading.✏️ Proposed fix
if (!(resolved %in% available_cols)) { stop(paste0( - "Intensity column '", intensity, "' not found in input data. ", collapse = ", ") - ) + "Intensity column '", intensity, "' not found in input data.")) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/clean_Spectronaut.R` around lines 152 - 156, The stop() call that constructs the error message using paste0 when checking if resolved is in available_cols includes an unnecessary collapse argument; remove the collapse = ", " from the paste0(...) in the conditional that references resolved, available_cols and intensity so the message is built with a single paste0 call (or simply concatenate the strings) rather than using the ineffective collapse parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R`:
- Around line 392-398: The test calls to SpectronauttoMSstatsFormat use the
wrong argument name `heavyLabel`; update those calls to use the correct
parameter name `heavyLabels` so the function receives the intended vector (e.g.,
change `heavyLabel = c("K[Lys6]")` to `heavyLabels = c("K[Lys6]")` in both test
invocations), then re-run the tests to ensure the heavy-label classification
logic in SpectronauttoMSstatsFormat is actually exercised.
- Line 19: The test is asserting against the wrong object; change the assertion
to check the converted result's column instead of the raw input. Replace the use
of spectronaut_raw$IsotopeLabelType with output$IsotopeLabelType in the
expect_true(all(...)) call (and optionally ensure output$IsotopeLabelType
exists/non-NULL before asserting) so the test validates the conversion that adds
IsotopeLabelType.
In `@man/SpectronauttoMSstatsFormat.Rd`:
- Around line 53-63: The docs for the heavyLabels parameter are misleading:
update the roxygen param documentation in
R/converters_SpectronauttoMSstatsFormat.R (the source for
man/SpectronauttoMSstatsFormat.Rd) to show the actual expected format (e.g. use
examples like c("K[Lys6]") or c("K[Lys6]", "R[Arg10]") and state that labels
must match the bracketed substring in the peptide sequence), preserving the
explanation of how peptides are classified (IsotopeLabelType "H"/"L"/NA) and
that NULL defaults to all "L"; run roxygenize to regenerate the .Rd.
In `@R/converters_SpectronauttoMSstatsFormat.R`:
- Around line 15-25: The documentation for heavyLabels is misleading: the
implementation in .assignSpectronautIsotopeLabelType expects labels with the
amino-acid prefix plus the bracketed isotope (e.g. "K[Lys6]" or
c("K[Lys6]","R[Arg10]")) rather than bare names like "Lys6"; update the
parameter description for heavyLabels to state the required format (amino-acid
letter immediately followed by the bracketed label), provide corrected examples
matching the test (e.g. heavyLabels = c("K[Lys6]","R[Arg10]")), and keep the
existing behavior notes about classification to "H"/"L"/NA and the NULL default.
---
Duplicate comments:
In `@R/clean_Spectronaut.R`:
- Around line 161-177: The docs and implementation disagree about heavyLabels
format: the code in clean_Spectronaut.R (logic around heavyLabels,
sub("\\[.*\\]", "", heavyLabels), and FG.LabeledSequence classification) expects
labels like "K[Lys6]" (amino-acid plus bracketed label) but the roxygen text
claims just "Lys6"; update the documentation to specify that heavyLabels must
include the residue letter with the bracketed modification (e.g., "K[Lys6]") and
note that the function strips the bracketed part to determine the residue (used
by the IsotopeLabelType assignment and light/heavy classification), or
alternatively change the implementation to accept bare labels by mapping residue
letters from a provided label-to-residue mapping if you want to keep "Lys6"
syntax—adjust the roxygen examples and parameter description for heavyLabels and
any examples/tests to match the chosen format.
- Around line 79-84: Update the error message string in the check that verifies
peptideSequenceColumn exists in spec_input: the code block using
.standardizeColnames(peptideSequenceColumn) and getOption("MSstatsLog")
currently builds msg with "The following column are missing from the input
data:" — change this to "The following column is missing from the input data:"
(or pluralize based on context if you intentionally expect multiple columns) so
the subject-verb agreement is correct when stop(msg) is called.
In `@R/converters_SpectronauttoMSstatsFormat.R`:
- Around line 54-58: The function SpectronauttoMSstatsFormat currently accepts a
vector for the intensity argument and will silently use only the first element;
add an explicit validation near the start of the function (after the initial
argument parsing, e.g., after line ~70) that uses missing(intensity) to detect
omitted argument and otherwise checks that length(intensity) == 1 and
is.character/introspect the type, and if not, stop() with a clear error message
instructing callers to pass a single intensity name (e.g., "PeakArea");
reference the parameter intensity and function SpectronauttoMSstatsFormat when
implementing this guard.
---
Nitpick comments:
In `@R/clean_Spectronaut.R`:
- Around line 152-156: The stop() call that constructs the error message using
paste0 when checking if resolved is in available_cols includes an unnecessary
collapse argument; remove the collapse = ", " from the paste0(...) in the
conditional that references resolved, available_cols and intensity so the
message is built with a single paste0 call (or simply concatenate the strings)
rather than using the ineffective collapse parameter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bccb970a-19d9-491f-9205-f00e12fa0456
⛔ Files ignored due to path filters (1)
inst/tinytest/raw_data/Spectronaut/boxcar_protein_turnover_input.csvis excluded by!**/*.csv
📒 Files selected for processing (4)
R/clean_Spectronaut.RR/converters_SpectronauttoMSstatsFormat.Rinst/tinytest/test_converters_SpectronauttoMSstatsFormat.Rman/SpectronauttoMSstatsFormat.Rd
There was a problem hiding this comment.
♻️ Duplicate comments (3)
R/clean_Spectronaut.R (3)
79-84:⚠️ Potential issue | 🟡 MinorGrammar issue: "are" should be "is".
Since this error message refers to a single column, use singular verb agreement.
if (!(.standardizeColnames(peptideSequenceColumn) %in% colnames(spec_input))) { - msg = paste("The following column are missing from the input data:", + msg = paste("The following column is missing from the input data:", peptideSequenceColumn)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/clean_Spectronaut.R` around lines 79 - 84, The error message uses plural verb "are" but refers to a single column; update the message built in the block that checks .standardizeColnames(peptideSequenceColumn) against colnames(spec_input) to use singular agreement (e.g., "The following column is missing from the input data:") and ensure this updated string is passed to getOption("MSstatsLog")("ERROR", ...) and stop(...), referencing the peptideSequenceColumn variable so the logged/stopped text remains informative.
191-203:⚠️ Potential issue | 🟠 MajorHeavy-label pattern logic is still broken.
When
heavyLabels = "Lys6"(without brackets),sub("\\[.*\\]", "", heavyLabels)does nothing, sobare_amino_acids = "Lys6". Similarly,heavy_brackets_escaped_patternalso resolves to"Lys6"since there are no brackets to escape. Both patterns match the same substring, and neither correctly identifies bracketed labels like[Lys6]vs. bare amino acids likeK.Per the docstring, the function should:
- Detect
[Lys6](bracketed) → Heavy- Detect bare
K(lysine residue without label) → Light- No lysine → NA
Fix:
-.assignSpectronautIsotopeLabelType = function(spec_input, heavyLabels) { - IsotopeLabelType = PeptideSequence = NULL - if (is.null(heavyLabels)) { - return(spec_input) - } - - bare_amino_acids = sub("\\[.*\\]", "", heavyLabels) - bare_amino_acids_pattern = paste0(bare_amino_acids, collapse = "|") - heavy_pattern = paste0(heavyLabels, collapse = "|") - heavy_brackets_escaped_pattern = paste( - gsub("([\\[\\]])", "\\\\\\1", heavy_pattern, perl = TRUE), - collapse = "|" - ) +.assignSpectronautIsotopeLabelType = function(spec_input, heavyLabels) { + IsotopeLabelType = PeptideSequence = NULL + if (is.null(heavyLabels)) { + return(spec_input) + } + + # Map 3-letter amino acid codes to single-letter codes + aa_map = c(Lys = "K", Arg = "R", Leu = "L", Ile = "I", + Val = "V", Phe = "F", Tyr = "Y", Trp = "W") + # Extract amino acid name (e.g., "Lys" from "Lys6") and map to single letter + aa_names = sub("[0-9]+$", "", heavyLabels) + bare_amino_acids = aa_map[aa_names] + bare_amino_acids_pattern = paste0(bare_amino_acids, collapse = "|") + # Build pattern to match bracketed labels: [Lys6] + heavy_brackets_escaped_pattern = paste0( + "\\[", heavyLabels, "\\]", collapse = "|" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/clean_Spectronaut.R` around lines 191 - 203, The current logic treats "Lys6" and "[Lys6]" the same; fix by (1) deriving the one-letter residue(s) from heavyLabels (e.g., convert "Lys6" -> "K") into bare_amino_acids (remove numeric suffix and map full AA name to one-letter), (2) build heavy_brackets_escaped_pattern to match bracketed labels explicitly by wrapping each heavy label in escaped brackets (e.g., paste0("\\[", gsub("([\\[\\]])","\\\\\\\\\\1", heavyLabel), "\\]")), and (3) make bare_amino_acids_pattern match the residue letter only when not inside brackets (use a negative lookbehind like (?<!\\[) or an equivalent regex) so the fcase on spec_input[, IsotopeLabelType := ...] correctly sets "H" for bracketed labels, "L" for bare residue letters, and NA otherwise; update references: heavyLabels, bare_amino_acids, bare_amino_acids_pattern, heavy_brackets_escaped_pattern, and the spec_input[, IsotopeLabelType := data.table::fcase(...)].
61-85:⚠️ Potential issue | 🟠 MajorMissing validation for
RFileNameandRCondition.These columns are used in line 31 and renamed in line 44. If absent,
setnames()fails with a cryptic error. Add them torequired_columnsso callers receive a clear diagnostic..validateSpectronautInput = function(spec_input, peptideSequenceColumn) { - # Only FGCharge is truly required - required_columns = c("FGCharge") + required_columns = c("FGCharge", "RFileName", "RCondition") missing_columns = setdiff(required_columns, colnames(spec_input))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/clean_Spectronaut.R` around lines 61 - 85, The validation in .validateSpectronautInput currently only requires "FGCharge" and misses RFileName and RCondition which are later used/renamed (causing setnames() to fail); update the required_columns vector in .validateSpectronautInput to include "RFileName" and "RCondition", adjust the missing_columns/error messages accordingly so callers get a clear diagnostic before setnames() is called.
🧹 Nitpick comments (1)
R/clean_Spectronaut.R (1)
152-156: Unusedcollapseargument and unhelpful error message.
paste0()with a single string doesn't usecollapse. Consider listing available intensity-like columns to help users diagnose the issue.if (!(resolved %in% available_cols)) { - stop(paste0( - "Intensity column '", intensity, "' not found in input data. ", collapse = ", ") - ) + stop(paste0( + "Intensity column '", intensity, "' (resolved to '", resolved, + "') not found in input data.")) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/clean_Spectronaut.R` around lines 152 - 156, The stop message in the block that checks if (resolved %in% available_cols) is unhelpful and incorrectly uses collapse in paste0; update the error to show the requested intensity name (intensity or resolved) and list available intensity-like columns from available_cols so users can diagnose the mismatch. Replace the current paste0(...) call with a construction that concatenates a clear message (e.g., "Intensity column '... ' not found") and then appends the vector of available_cols collapsed with ", " (using paste(..., collapse = ", ")), ensuring you remove the unused collapse argument and reference resolved/intensity and available_cols in the message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@R/clean_Spectronaut.R`:
- Around line 79-84: The error message uses plural verb "are" but refers to a
single column; update the message built in the block that checks
.standardizeColnames(peptideSequenceColumn) against colnames(spec_input) to use
singular agreement (e.g., "The following column is missing from the input
data:") and ensure this updated string is passed to
getOption("MSstatsLog")("ERROR", ...) and stop(...), referencing the
peptideSequenceColumn variable so the logged/stopped text remains informative.
- Around line 191-203: The current logic treats "Lys6" and "[Lys6]" the same;
fix by (1) deriving the one-letter residue(s) from heavyLabels (e.g., convert
"Lys6" -> "K") into bare_amino_acids (remove numeric suffix and map full AA name
to one-letter), (2) build heavy_brackets_escaped_pattern to match bracketed
labels explicitly by wrapping each heavy label in escaped brackets (e.g.,
paste0("\\[", gsub("([\\[\\]])","\\\\\\\\\\1", heavyLabel), "\\]")), and (3)
make bare_amino_acids_pattern match the residue letter only when not inside
brackets (use a negative lookbehind like (?<!\\[) or an equivalent regex) so the
fcase on spec_input[, IsotopeLabelType := ...] correctly sets "H" for bracketed
labels, "L" for bare residue letters, and NA otherwise; update references:
heavyLabels, bare_amino_acids, bare_amino_acids_pattern,
heavy_brackets_escaped_pattern, and the spec_input[, IsotopeLabelType :=
data.table::fcase(...)].
- Around line 61-85: The validation in .validateSpectronautInput currently only
requires "FGCharge" and misses RFileName and RCondition which are later
used/renamed (causing setnames() to fail); update the required_columns vector in
.validateSpectronautInput to include "RFileName" and "RCondition", adjust the
missing_columns/error messages accordingly so callers get a clear diagnostic
before setnames() is called.
---
Nitpick comments:
In `@R/clean_Spectronaut.R`:
- Around line 152-156: The stop message in the block that checks if (resolved
%in% available_cols) is unhelpful and incorrectly uses collapse in paste0;
update the error to show the requested intensity name (intensity or resolved)
and list available intensity-like columns from available_cols so users can
diagnose the mismatch. Replace the current paste0(...) call with a construction
that concatenates a clear message (e.g., "Intensity column '... ' not found")
and then appends the vector of available_cols collapsed with ", " (using
paste(..., collapse = ", ")), ensuring you remove the unused collapse argument
and reference resolved/intensity and available_cols in the message.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d0307056-91f8-4375-9f52-abb56bf0040f
📒 Files selected for processing (2)
R/clean_Spectronaut.Rinst/tinytest/test_converters_SpectronauttoMSstatsFormat.R
🚧 Files skipped from review as they are similar to previous changes (1)
- inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R
0d76e0d to
a65cdc5
Compare
Motivation & Context
The Spectronaut converter lacked flexibility needed for protein-turnover workflows: peptide-sequence sources were fixed, isotope-label classification could not be driven by user-supplied heavy-label patterns, and intensity selection relied on a narrow enum-style mapping. This made it difficult to parse bracketed heavy-label annotations in modified peptide sequences (e.g., K[Lys6]) and to run conversions when Spectronaut column names or turnover-focused MS1 intensity columns were used.
Solution Summary
Introduce configurable peptide-sequence and isotope-label handling plus more robust intensity-column resolution:
peptideSequenceColumnparameter (default "EG.ModifiedSequence") so callers can choose the peptide source column.heavyLabelsparameter (default NULL) that, when provided, parses peptide sequences for bracketed labels and assigns IsotopeLabelType ("H"/"L"/NA) per peptide; when NULL, behavior defaults to treating all peptides as light.Detailed Changes
R/clean_Spectronaut.R
R/converters_SpectronauttoMSstatsFormat.R
New/updated helper functions (R/clean_Spectronaut.R)
Documentation updates
Tests
Unit Tests Added/Modified
Coding Guidelines / Violations