Merge TPR simulation function into Development#6
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds end-to-end TPR power-curve simulation and interactive Plotly plotting, implements concentration-ladder construction and per-configuration simulation, updates experiment mean-profile dose matching, adjusts package Imports/NAMESPACE, and adds documentation plus unit tests for the new features. Changes
Sequence DiagramsequenceDiagram
participant User
participant Runner as run_tpr_simulation
participant Ladder as .build_concentration_ladders
participant OneSim as .run_one_tpr_simulation
participant Simulator as futureExperimentSimulation
participant Plotter as plot_tpr_power_curve
participant GG as ggplot2
participant Plotly as plotly
User->>Runner: call(rep_range, concentrations, dose_range, data, protein, ...)
Runner->>Ladder: generate concentration subsets
Ladder-->>Runner: concentration ladders
Runner->>OneSim: iterate combos (N_rep, NumConcs)
OneSim->>Simulator: simulate experiment (data, params)
Simulator-->>OneSim: Hit_Rates_Data
OneSim-->>Runner: filtered TPR rows (Interaction, TPR, N_rep, NumConcs)
Runner-->>User: combined simulation results (data.frame)
User->>Plotter: call(simulation_results)
Plotter->>GG: build ggplot panel
GG-->>Plotter: ggplot object
Plotter->>Plotly: ggplotly + layout
Plotly-->>Plotter: interactive plot
Plotter-->>User: plotly visualization
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
R/TPR_Power_Curve.R (1)
34-46: Consider simplifying by removing the intermediatesim_argslist.The
sim_argslist is created and then immediately unpacked in the function call. This adds verbosity without benefit.♻️ Simplified version
run_one <- function(n_rep, k_conc, seed = 123) { set.seed(seed + n_rep * 100 + k_conc) concs_k <- CONC_MAP[[as.character(k_conc)]] - sim_args <- list( - N_proteins = n_proteins, - N_rep = n_rep, - Concentrations = concs_k, - IC50_Prediction = FALSE - ) - - temp_res <- futureExperimentSimulation( - N_proteins = sim_args$N_proteins, - N_rep = sim_args$N_rep, - Concentrations = sim_args$Concentrations, - IC50_Prediction = sim_args$IC50_Prediction - ) + temp_res <- futureExperimentSimulation( + N_proteins = n_proteins, + N_rep = n_rep, + Concentrations = concs_k, + IC50_Prediction = FALSE + ) temp_res$Hit_Rates_Data |>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/TPR_Power_Curve.R` around lines 34 - 46, The sim_args list is unnecessary boilerplate: remove the sim_args variable and call futureExperimentSimulation directly with the original variables (n_proteins, n_rep, concs_k, and FALSE) by passing them to the corresponding parameters N_proteins, N_rep, Concentrations, and IC50_Prediction in the futureExperimentSimulation call so you only use the function name futureExperimentSimulation and the original symbols n_proteins, n_rep, concs_k, and FALSE.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/TPR_Power_Curve.R`:
- Around line 91-92: The linetype mapping currently uses a fixed vector ltypes
and assigns ltype_values <- setNames(ltypes[seq_along(rep_levels)],
as.character(rep_levels)), which will produce NAs when length(rep_levels) >
length(ltypes); change the assignment to repeat or recycle ltypes to cover all
replicate levels (e.g., generate a vector of length length(rep_levels) by
repeating ltypes with length.out = length(rep_levels) or using rep(ltypes,
length.out = length(rep_levels))) and then call setNames(...) so ltype_values
maps every element of rep_levels to a valid linetype; refer to the variables
ltypes, ltype_values and rep_levels when making the change.
- Around line 26-28: The function run_tpr_simulation currently assumes rep_range
is a two-element integer vector; add input validation at the top of
run_tpr_simulation to check that rep_range is a length-2 numeric (or integer)
vector with no NA/NaN/Inf values, both elements are whole numbers (or can be
safely coerced to integers), and rep_range[1] <= rep_range[2]; if any check
fails, stop with a clear error message mentioning rep_range and expected form
(e.g., "rep_range must be a length-2 integer vector with min <= max"). After
validation, coerce to integers (if needed) before creating rep_grid to avoid
downstream surprises when using rep_grid.
---
Nitpick comments:
In `@R/TPR_Power_Curve.R`:
- Around line 34-46: The sim_args list is unnecessary boilerplate: remove the
sim_args variable and call futureExperimentSimulation directly with the original
variables (n_proteins, n_rep, concs_k, and FALSE) by passing them to the
corresponding parameters N_proteins, N_rep, Concentrations, and IC50_Prediction
in the futureExperimentSimulation call so you only use the function name
futureExperimentSimulation and the original symbols n_proteins, n_rep, concs_k,
and FALSE.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0b0cfe25-885d-428f-890c-886741a75f85
📒 Files selected for processing (10)
DESCRIPTIONNAMESPACER/TPR_Power_Curve.Rman/ConvertGroupToNumericDose.Rdman/DoseResponseFit.Rdman/FutureExperimentSimulation.Rdman/PredictIC50Parallel.Rdman/VisualizeResponseProtein.Rdman/plot_tpr_power_curve.Rdman/run_tpr_simulation.Rd
💤 Files with no reviewable changes (5)
- man/ConvertGroupToNumericDose.Rd
- man/VisualizeResponseProtein.Rd
- man/PredictIC50Parallel.Rd
- man/DoseResponseFit.Rd
- man/FutureExperimentSimulation.Rd
R/TPR_Power_Curve.R
Outdated
| IC50_Prediction = FALSE | ||
| ) | ||
|
|
||
| temp_res <- futureExperimentSimulation( |
There was a problem hiding this comment.
I'm a little confused. Users should also pass in the actual dataset and the Protein ID that is considered a strong interaction. But I don't see where this is being passed to the function.
R/TPR_Power_Curve.R
Outdated
| "8" = c(0, 1, 10, 30, 100, 300, 1000, 3000), | ||
| "9" = c(0, 1, 3, 10, 30, 100, 300, 1000, 3000) | ||
| ) | ||
|
|
There was a problem hiding this comment.
As discussed with Sarah, this should not be hard coded but rather each subsequent value should be picked based on farthest distance from the log(median) among a user's set of doses, not this hard-coded list.
A user will have any arbitrary set of doses as well (i.e. it's not just 0, 1, 3, 10. You might have 0, 2, 4, 20, ...)
This also means that another input parameter should be the concentrations themselves (e.g. 0,1,3,10,30,100,300,1000,3000) and the dose_range (e.g. c(2,9) being doses 2 through 9).
R/TPR_Power_Curve.R
Outdated
| stop("rep_range must be a numeric vector of length 2 with c(min, max) where min <= max.") | ||
| } | ||
| if (rep_range[2] > 5) { | ||
| stop("Maximum replicates is 5 (limited by available line styles for plotting).") |
There was a problem hiding this comment.
(limited by available line styles for plotting) - if a user sees this, it's not going to make sense. You can remove this part.
I don't think there needs to be a maximum replicate range in this function. It should go inside the plotting function since that's where the plotting limitation
R/TPR_Power_Curve.R
Outdated
| #' Run TPR simulation across a grid of concentration counts and replicate counts | ||
| #' | ||
| #' Sweeps over combinations of dose counts (2-9) and replicate counts, | ||
| #' calling \code{futureExperimentSimulation()} for each combination. |
There was a problem hiding this comment.
People don't know what TPR simulation means or what it means to sweep over dose counts / replicate counts calling this futureExperimentSimulation function.
Can you be more thorough in a way so that a biologist with minimal coding experience would understand what this function does?
R/TPR_Power_Curve.R
Outdated
| #' | ||
| #' @importFrom dplyr filter mutate select if_else | ||
| #' @export | ||
| run_tpr_simulation <- function(rep_range, n_proteins = 1000) { |
There was a problem hiding this comment.
There should be unit tests for this function
R/TPR_Power_Curve.R
Outdated
| if (!is.numeric(rep_range) || length(rep_range) != 2 || rep_range[1] > rep_range[2]) { | ||
| stop("rep_range must be a numeric vector of length 2 with c(min, max) where min <= max.") | ||
| } | ||
| if (rep_range[2] > 5) { |
There was a problem hiding this comment.
Shouldn't this validation be for the difference between max and min? Not the max replicate value.
R/TPR_Power_Curve.R
Outdated
| if (!requireNamespace("plotly", quietly = TRUE)) { | ||
| stop("Package 'plotly' is required for interactive plots. Please install it.") |
There was a problem hiding this comment.
We don't need this because plotly should already be a dependency of this package, meaning users will have it automatically installed when they install MSstatsResponse
R/TPR_Power_Curve.R
Outdated
| } | ||
| ltype_values <- setNames(ltypes[seq_along(rep_levels)], as.character(rep_levels)) | ||
|
|
||
| make_panel <- function(data, color, show_legend = FALSE) { |
There was a problem hiding this comment.
Put this outside of the plotting function instead of inside it and rename it so that it's clear what it's doing, aka plotting the TPR curves using ggplot2/plotly.
| @@ -1,93 +0,0 @@ | |||
| % Generated by roxygen2: do not edit by hand | |||
There was a problem hiding this comment.
All of the documentation files in the man folder should NOT be deleted. Can you fix it so that devtools::document() does not remove these files?
R/TPR_Power_Curve.R
Outdated
| k_grid <- sort(unique(simulation_results$NumConcs)) | ||
| rep_levels <- sort(unique(simulation_results$N_rep)) | ||
|
|
||
| ltypes <- c("dotted", "dotdash", "dashed", "longdash", "solid") |
There was a problem hiding this comment.
On second thought - I thought we discussed doing a color gradient instead? In that case, 5 replicate validation is not needed either.
- Replace with actual dataset and protein ID - Replace hard-coded concentrations with a mechanism to pick based on farthest distance from the log(median) - Replace the linestyles with colot palettes for plot representation and removed maximum replicate rage - Provide examples for TPR simulation usage - Re-write the comments in a biologist friendly tone - Extract the single TPR simulation logic into a helper function - Improve the plotly import method - Write unit test cases for the plotly function - Move plotly from suggested into required dependency and remove the installation check from code
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
R/ExperimentalDesignSimulation.R (1)
249-254: Potential division by zero or NaN when all profile doses are zero.If the user's data contains only control measurements (all
dose_numeric == 0), the condition at line 249 (any(profile$dose_numeric != 0)) will beFALSE, leavingLogIntensitiesasNAfor non-zero target concentrations. The subsequent fill loop (lines 258-305) handles this, but it's worth noting that this relies on the fallback behavior.This is acceptable given the fallback logic, but consider adding a warning when no non-zero doses exist in the profile data.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/ExperimentalDesignSimulation.R` around lines 249 - 254, The code currently skips filling LogIntensities when a profile has no non-zero doses (the else branch when any(profile$dose_numeric != 0) is FALSE), which can leave NA values relying on later fallback; add an explicit check for the all-zero-doses case (using profile$dose_numeric) and emit a clear warning (e.g., via warning()) that this profile contains only control measurements so non-zero target concentrations cannot be matched, referencing the profile identifier (or row index) and the object names used here (profile, non_zero, complete_profile$LogIntensities) so the user can locate the problematic profile; do not change the existing fallback logic, only add the warning in the branch where no non-zero doses exist.tests/testthat/test-TPR_Power_Curve.R (1)
92-105: Consider adding a test for thedataprovided withoutproteincase.This test verifies that
data=NULL, protein=NULLworks, but doesn't test the case wheredatais provided withoutprotein. Based onfutureExperimentSimulation's validation, this should fail with "must specify at least one" error.🧪 Proposed additional test
test_that("run_tpr_simulation errors when data provided without protein", { mock_data <- data.frame( protein = "P1", drug = c("DMSO", "Drug1"), dose = c(0, 1e-6), response = c(20, 18) ) expect_error( run_tpr_simulation( rep_range = c(1, 1), concentrations = c(0, 100, 1000), dose_range = c(2, 3), n_proteins = 10, data = mock_data, protein = NULL ), "must specify" ) })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test-TPR_Power_Curve.R` around lines 92 - 105, Add a test that calls run_tpr_simulation with a non-NULL data.frame but protein = NULL and asserts it errors with the "must specify" message; create a small mock_data (columns: protein, drug, dose, response) and wrap the call in expect_error(..., "must specify") to mirror futureExperimentSimulation's validation and ensure run_tpr_simulation surfaces the expected validation error.R/TPR_Power_Curve.R (2)
76-78: Seed collision possible for different parameter combinations.The seed formula
seed + n_rep * 100 + length(concs)can produce identical seeds for different(n_rep, concs)pairs. For example,(n_rep=1, length=102)and(n_rep=2, length=2)both produceseed + 202.Consider using a hash-based approach or including both values in a way that prevents collision:
♻️ Proposed fix for unique seeding
.run_one_tpr_simulation <- function(n_rep, concs, n_proteins, data = NULL, protein = NULL, seed = 123) { - set.seed(seed + n_rep * 100 + length(concs)) + # Use prime multiplier to reduce collision likelihood + set.seed(seed + n_rep * 1009 + length(concs) * 17)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/TPR_Power_Curve.R` around lines 76 - 78, The current seed formula in .run_one_tpr_simulation (seed + n_rep * 100 + length(concs)) can collide for different (n_rep, concs) pairs; replace it with a collision-resistant seed derived from hashing the combined parameters: compute a new_seed by hashing a string built from seed, n_rep and the full concs vector (e.g., paste(seed, n_rep, paste(concs, collapse=","))) using a stable hash (digest::digest with xxhash32 or sha1), convert a portion of the hex hash to an integer (strtoi(substr(...), 16L)) and use set.seed(new_seed); update NAMESPACE/DESCRIPTION or imports to include digest if not already present.
284-299: Plot only shows "Strong" interaction results.The function filters to
Interaction == "Strong"at line 288, discarding "Weak" interaction data. The documentation at lines 260-261 mentions "user-selected protein template" but doesn't clarify that only strong interactions are visualized.If this is intentional, consider updating the documentation. If users might want to see both interaction types, consider adding a parameter to control this.
📝 Documentation clarification
#' Creates an interactive plot showing how the true positive rate #' (detection power) changes with the number of doses and replicates -#' for the user-selected protein template. Line color shading goes +#' for strong interactions only. Line color shading goes #' from light (fewest replicates) to dark (most replicates). +#' +#' Note: This function currently visualizes only the "Strong" interaction +#' category. Weak interaction results from \code{run_tpr_simulation} are +#' not displayed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/TPR_Power_Curve.R` around lines 284 - 299, The function plot_tpr_power_curve currently hard-filters simulation_results to Interaction == "Strong" (results_protein), which drops "Weak" data; either make this behavior configurable by adding an argument (e.g., interaction_type or include_interactions) to plot_tpr_power_curve and use that parameter when subsetting simulation_results, or if the plot must remain Strong-only, update the function documentation and comment to state explicitly that only "Strong" interactions are plotted; locate the filtering logic around results_protein and the function signature plot_tpr_power_curve to implement the parameter or adjust the docs accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/ExperimentalDesignSimulation.R`:
- Around line 336-349: The default_template filtering using exact matching
(default_template$weak_interaction[default_template$weak_interaction$dose %in%
concentrations, ] and similarly for default_template$no_interaction) can return
zero rows for non-standard concentrations; change the logic in the
weak_interaction and no_interaction branches to first attempt the exact filter
and if the result has zero rows fall back to calling .getMeanProfile(NULL,
drug_data, concentrations) (or otherwise map concentrations to nearest-template
doses) so that the returned profile always has length(concentrations),
referencing weak_interaction, no_interaction, default_template, .getMeanProfile
and concentrations to locate the code to modify.
In `@R/TPR_Power_Curve.R`:
- Around line 86-91: When building sim_args in TPR_Power_Curve.R, ensure we
validate the dependency between data and protein: if data is provided
(sim_args$data set) but protein is NULL, either require the caller to supply
protein or provide a clear error; add a check after setting sim_args$data that
stops with a descriptive message (referencing futureExperimentSimulation's
requirement) if protein is NULL, or alternatively set
sim_args$strong_proteins/weak_proteins/no_interaction_proteins appropriately;
update the validation around sim_args and the parameter named protein so
futureExperimentSimulation won't error unexpectedly.
---
Nitpick comments:
In `@R/ExperimentalDesignSimulation.R`:
- Around line 249-254: The code currently skips filling LogIntensities when a
profile has no non-zero doses (the else branch when any(profile$dose_numeric !=
0) is FALSE), which can leave NA values relying on later fallback; add an
explicit check for the all-zero-doses case (using profile$dose_numeric) and emit
a clear warning (e.g., via warning()) that this profile contains only control
measurements so non-zero target concentrations cannot be matched, referencing
the profile identifier (or row index) and the object names used here (profile,
non_zero, complete_profile$LogIntensities) so the user can locate the
problematic profile; do not change the existing fallback logic, only add the
warning in the branch where no non-zero doses exist.
In `@R/TPR_Power_Curve.R`:
- Around line 76-78: The current seed formula in .run_one_tpr_simulation (seed +
n_rep * 100 + length(concs)) can collide for different (n_rep, concs) pairs;
replace it with a collision-resistant seed derived from hashing the combined
parameters: compute a new_seed by hashing a string built from seed, n_rep and
the full concs vector (e.g., paste(seed, n_rep, paste(concs, collapse=",")))
using a stable hash (digest::digest with xxhash32 or sha1), convert a portion of
the hex hash to an integer (strtoi(substr(...), 16L)) and use
set.seed(new_seed); update NAMESPACE/DESCRIPTION or imports to include digest if
not already present.
- Around line 284-299: The function plot_tpr_power_curve currently hard-filters
simulation_results to Interaction == "Strong" (results_protein), which drops
"Weak" data; either make this behavior configurable by adding an argument (e.g.,
interaction_type or include_interactions) to plot_tpr_power_curve and use that
parameter when subsetting simulation_results, or if the plot must remain
Strong-only, update the function documentation and comment to state explicitly
that only "Strong" interactions are plotted; locate the filtering logic around
results_protein and the function signature plot_tpr_power_curve to implement the
parameter or adjust the docs accordingly.
In `@tests/testthat/test-TPR_Power_Curve.R`:
- Around line 92-105: Add a test that calls run_tpr_simulation with a non-NULL
data.frame but protein = NULL and asserts it errors with the "must specify"
message; create a small mock_data (columns: protein, drug, dose, response) and
wrap the call in expect_error(..., "must specify") to mirror
futureExperimentSimulation's validation and ensure run_tpr_simulation surfaces
the expected validation error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fb2bd13d-17ca-4496-9c82-a9a402fcfaf4
📒 Files selected for processing (9)
DESCRIPTIONNAMESPACER/ExperimentalDesignSimulation.RR/TPR_Power_Curve.Rman/DoseResponseFit.Rdman/plot_tpr_power_curve.Rdman/run_tpr_simulation.Rdtests/testthat/test-ExperimentalDesignSimulation.Rtests/testthat/test-TPR_Power_Curve.R
✅ Files skipped from review due to trivial changes (3)
- man/DoseResponseFit.Rd
- man/plot_tpr_power_curve.Rd
- man/run_tpr_simulation.Rd
🚧 Files skipped from review as they are similar to previous changes (2)
- DESCRIPTION
- NAMESPACE
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/TPR_Power_Curve.R`:
- Around line 25-27: The code currently sets control <- min(concentrations),
which contradicts the docstring that control must be 0; change this to
explicitly require/control for 0: validate that 0 is present in the
concentrations vector at the start of the function (stop with an informative
error if not), then set control <- 0 (keep highest <- max(concentrations) and
compute candidates <- setdiff(concentrations, c(control, highest))). Use the
variable names concentrations, control, highest, and candidates to locate and
modify the logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e78deb24-c9be-4966-a4b6-7fbb124766bf
📒 Files selected for processing (4)
R/ExperimentalDesignSimulation.RR/TPR_Power_Curve.Rman/plot_tpr_power_curve.Rdtests/testthat/test-TPR_Power_Curve.R
✅ Files skipped from review due to trivial changes (2)
- man/plot_tpr_power_curve.Rd
- tests/testthat/test-TPR_Power_Curve.R
R/ExperimentalDesignSimulation.R
Outdated
| for (i in seq_len(nrow(complete_profile))) { | ||
| target = complete_profile$dose[i] | ||
| if (target == 0) { | ||
| # Control: match any zero-dose entry | ||
| zero_match = profile[profile$dose_numeric == 0, ] | ||
| if (nrow(zero_match) > 0) { | ||
| complete_profile$LogIntensities[i] = zero_match$LogIntensities[1] | ||
| } | ||
| } else if (any(profile$dose_numeric != 0)) { | ||
| # Non-zero: find closest on log scale | ||
| non_zero = profile[profile$dose_numeric != 0, ] | ||
| log_ratios = abs(log10(non_zero$dose_numeric) - log10(target)) | ||
| best = which.min(log_ratios) | ||
| complete_profile$LogIntensities[i] = non_zero$LogIntensities[best] | ||
| } | ||
| } |
There was a problem hiding this comment.
I think this loop is overly complex. Shouldn't the dose_str column of the profile variable match the string version of the complete_profile$dose column? This could be a simple left_join if you have a string version of the dose column in complete_profile.
R/ExperimentalDesignSimulation.R
Outdated
| complete_profile = complete_profile %>% | ||
| left_join(profile, by = "dose") | ||
| if (nrow(profile) > 0 && !any(profile$dose_numeric != 0)) { | ||
| warning("No non-zero dose measurements found for the selected protein. Template will use fallback values.") |
There was a problem hiding this comment.
I think this should be a stop instead of a warning. It doesn't make sense to be able to make simulations for sample size calculation if your dataset has only dose 0.
R/ExperimentalDesignSimulation.R
Outdated
| default_template <- NULL | ||
| if (is.null(weak_proteins) || is.null(no_interaction_proteins)) { | ||
| template1_path <- system.file("extdata", "template1.RDS", package = "MSstatsResponse") | ||
| if (file.exists(template1_path)) { | ||
| default_template <- readRDS(template1_path) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
This seems to be a hard-coded template specific to the dataset you were given, but not extensible to other datasets. I think if a weak interaction / no interaction protein is not defined, we should fallback to giving the baseline profile.
R/ExperimentalDesignSimulation.R
Outdated
| weak_interaction = if (!is.null(weak_proteins)) { | ||
| .getMeanProfile(weak_proteins, drug_data, concentrations) | ||
| } else if (!is.null(default_template)) { | ||
| default_template$weak_interaction[default_template$weak_interaction$dose %in% concentrations, ] | ||
| } else { | ||
| .getMeanProfile(NULL, drug_data, concentrations) | ||
| }, | ||
| no_interaction = if (!is.null(no_interaction_proteins)) { | ||
| .getMeanProfile(no_interaction_proteins, drug_data, concentrations) | ||
| } else if (!is.null(default_template)) { | ||
| default_template$no_interaction[default_template$no_interaction$dose %in% concentrations, ] | ||
| } else { | ||
| .getMeanProfile(NULL, drug_data, concentrations) |
There was a problem hiding this comment.
Will anything break in the package if weak_interaction / no_interaction is set to the baseline profile (given we never provided any weak interaction / no interaction protein)?
|
|
||
| # Extract templates: use user data for specified proteins, defaults for others | ||
| template = list( | ||
| strong_interaction = .getMeanProfile(strong_proteins, drug_data, concentrations), |
There was a problem hiding this comment.
I'm a little confused, what does .getMeanProfile return when you defined a non-null strong_proteins? Reading the function, it doesn't return anything.
There was a problem hiding this comment.
When protein_ids is non-null, .getMeanProfile returns a data.frame with columns dose, Intensity, LogIntensities, ratio - the function filters drug_data for those proteins, groups by dose, computes mean LogIntensities, then fills missing doses via interpolation. The return is at the end of the function.
| result <- selection_order[as.character(k_range)] | ||
| result <- result[!sapply(result, is.null)] | ||
|
|
||
| if (length(result) == 0) { |
There was a problem hiding this comment.
Under what circumstances does this occur? Because it would be helpful to have validation earlier on in the function that this ladder can't be constructed, this error message makes it unclear what the user needs to do.
| #' @param data Optional prepared dose-response data for user-based templates. | ||
| #' @param protein Optional protein ID for strong interaction template. |
There was a problem hiding this comment.
I think these parameters should be required.
Previously, it appeared like the could be optional because we had the template.rds file in the package, but that file is specific to a dataset that Sarah had access to. I think privacy issues made it so that Sarah couldn't put the full dataset and run everything E2E in the package.
R/TPR_Power_Curve.R
Outdated
| #' @noRd | ||
| .run_one_tpr_simulation <- function(n_rep, concs, n_proteins, data = NULL, | ||
| protein = NULL, seed = 123) { | ||
| set.seed(seed + n_rep * 100 + length(concs)) |
There was a problem hiding this comment.
I think we can just do set.seed(seed). Users can always adjust their seed number based on number of replicates / length of concentration
R/TPR_Power_Curve.R
Outdated
|
|
||
| temp_res <- do.call(futureExperimentSimulation, sim_args) | ||
| temp_res$Hit_Rates_Data |> | ||
| dplyr::filter(Category %in% c("TPR (Strong)", "TPR (Weak)")) |> |
There was a problem hiding this comment.
I think it's safe to just filter only "TPR (Strong)" here.
R/TPR_Power_Curve.R
Outdated
| #' @param data Optional. User's prepared dose-response data (e.g., from | ||
| #' \code{MSstatsPrepareDoseResponseFit}). If provided, protein-specific | ||
| #' templates are extracted from this data instead of using defaults. | ||
| #' @param protein Optional. Character string specifying a protein ID to use as | ||
| #' the strong interaction template. Only used when \code{data} is provided. |
There was a problem hiding this comment.
As mentioned in another comment, should be required
- Simplify the profile matching loop - Use Stop instead of warning if there are no non-zero dose - Remove template1.RDS fallback, use baseline - Enhance the message for not generating any valid concentration subsets - Make data and protein required and remove superfluous validations - Update corresponding docs for turning the optional fields mandatory - Simplify seed in one tpr simulation - Filter only TPR strong - Update the test cases
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
R/TPR_Power_Curve.R (2)
288-304: Consider adding a guard for empty results.If
simulation_resultscontains no rows whereInteraction == "Strong"(e.g., due to upstream data issues),results_proteinwill be empty and the plot will render without data or error message. This edge case is unlikely given the current pipeline but could be confusing if encountered.🛡️ Optional: Add validation for empty data
plot_tpr_power_curve <- function(simulation_results) { k_grid <- sort(unique(simulation_results$NumConcs)) # Use only the Strong interaction results (user-selected protein template) results_protein <- simulation_results[simulation_results$Interaction == "Strong", ] + + if (nrow(results_protein) == 0) { + stop("No 'Strong' interaction results found in simulation_results.") + } p <- .make_tpr_panel(results_protein, "#a6dba0", "#1b7837", k_grid, TRUE)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/TPR_Power_Curve.R` around lines 288 - 304, Add a guard in plot_tpr_power_curve to handle the case where results_protein (simulation_results filtered by Interaction == "Strong") is empty: detect if results_protein has zero rows and either abort with a clear message (e.g., stop or warning) or return a simple empty/placeholder plotly object with an explanatory annotation, instead of calling .make_tpr_panel with empty data; reference results_protein, plot_tpr_power_curve and .make_tpr_panel to locate where to insert the check and subsequent early return.
97-104: Simplify redundantif_elseto direct assignment.Since line 98 filters to only
Category == "TPR (Strong)", theif_elseat line 102 will always evaluate to"Strong". The"Weak"branch is dead code.♻️ Proposed simplification
temp_res$Hit_Rates_Data |> dplyr::filter(Category == "TPR (Strong)") |> dplyr::mutate( N_rep = n_rep, NumConcs = length(concs), - Interaction = dplyr::if_else(Category == "TPR (Strong)", "Strong", "Weak") + Interaction = "Strong" ) |> dplyr::select(Interaction, TPR = Percent, N_rep, NumConcs)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/TPR_Power_Curve.R` around lines 97 - 104, The mutate call on temp_res$Hit_Rates_Data sets Interaction using dplyr::if_else but the preceding dplyr::filter(Category == "TPR (Strong)") ensures Category is always "TPR (Strong)", so replace the conditional with a direct assignment: in the dplyr::mutate that defines Interaction (within the pipeline starting at temp_res$Hit_Rates_Data |> ...), set Interaction = "Strong" instead of using dplyr::if_else to remove the dead "Weak" branch and simplify the expression.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/TPR_Power_Curve.R`:
- Around line 148-156: The example fails because run_tpr_simulation requires
arguments data and protein with no defaults; update the function signature for
run_tpr_simulation to set data = NULL and protein = NULL (or alternatively
update the example to supply minimal mock data/protein or wrap the example in
\dontrun{}), and ensure any internal code in run_tpr_simulation (e.g., checks or
operations that assume non-NULL data/protein) handles NULL appropriately so the
examples run without error.
---
Nitpick comments:
In `@R/TPR_Power_Curve.R`:
- Around line 288-304: Add a guard in plot_tpr_power_curve to handle the case
where results_protein (simulation_results filtered by Interaction == "Strong")
is empty: detect if results_protein has zero rows and either abort with a clear
message (e.g., stop or warning) or return a simple empty/placeholder plotly
object with an explanatory annotation, instead of calling .make_tpr_panel with
empty data; reference results_protein, plot_tpr_power_curve and .make_tpr_panel
to locate where to insert the check and subsequent early return.
- Around line 97-104: The mutate call on temp_res$Hit_Rates_Data sets
Interaction using dplyr::if_else but the preceding dplyr::filter(Category ==
"TPR (Strong)") ensures Category is always "TPR (Strong)", so replace the
conditional with a direct assignment: in the dplyr::mutate that defines
Interaction (within the pipeline starting at temp_res$Hit_Rates_Data |> ...),
set Interaction = "Strong" instead of using dplyr::if_else to remove the dead
"Weak" branch and simplify the expression.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bc5f754e-c855-4899-8011-be5b09124d28
📒 Files selected for processing (5)
R/ExperimentalDesignSimulation.RR/TPR_Power_Curve.Rman/run_tpr_simulation.Rdtests/testthat/test-ExperimentalDesignSimulation.Rtests/testthat/test-TPR_Power_Curve.R
✅ Files skipped from review due to trivial changes (3)
- R/ExperimentalDesignSimulation.R
- man/run_tpr_simulation.Rd
- tests/testthat/test-TPR_Power_Curve.R
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/testthat/test-ExperimentalDesignSimulation.R
R/TPR_Power_Curve.R
Outdated
| #' @examples | ||
| #' # Quick example with small simulation (for speed) | ||
| #' results <- run_tpr_simulation( | ||
| #' rep_range = c(1, 2), | ||
| #' concentrations = c(0, 10, 100, 1000), | ||
| #' dose_range = c(2, 4), | ||
| #' n_proteins = 50 | ||
| #' ) | ||
| #' head(results) |
There was a problem hiding this comment.
Documentation example will fail: data and protein are required but not provided.
The function signature requires data and protein (no default values), but the "Quick example" omits them. Running this example will error:
Error in run_tpr_simulation(rep_range = c(1, 2), ...) :
argument "data" is missing, with no default
Either add data = NULL, protein = NULL defaults to the function signature, or update the example to include placeholder/mock data.
🛡️ Option 1: Make parameters optional with NULL defaults
-run_tpr_simulation <- function(rep_range, concentrations, dose_range,
- data, protein, n_proteins = 1000) {
+run_tpr_simulation <- function(rep_range, concentrations, dose_range,
+ data = NULL, protein = NULL, n_proteins = 1000) {📝 Option 2: Wrap the example in \dontrun{}
-#' # Quick example with small simulation (for speed)
-#' results <- run_tpr_simulation(
+#' \dontrun{
+#' # Example with user-provided data
+#' results <- run_tpr_simulation(
#' rep_range = c(1, 2),
#' concentrations = c(0, 10, 100, 1000),
#' dose_range = c(2, 4),
+#' data = my_prepared_data,
+#' protein = "P12345",
#' n_proteins = 50
#' )
-#' head(results)
+#' }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #' @examples | |
| #' # Quick example with small simulation (for speed) | |
| #' results <- run_tpr_simulation( | |
| #' rep_range = c(1, 2), | |
| #' concentrations = c(0, 10, 100, 1000), | |
| #' dose_range = c(2, 4), | |
| #' n_proteins = 50 | |
| #' ) | |
| #' head(results) | |
| #' `@examples` | |
| #' \dontrun{ | |
| #' # Example with user-provided data | |
| #' results <- run_tpr_simulation( | |
| #' rep_range = c(1, 2), | |
| #' concentrations = c(0, 10, 100, 1000), | |
| #' dose_range = c(2, 4), | |
| #' data = my_prepared_data, | |
| #' protein = "P12345", | |
| #' n_proteins = 50 | |
| #' ) | |
| #' } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/TPR_Power_Curve.R` around lines 148 - 156, The example fails because
run_tpr_simulation requires arguments data and protein with no defaults; update
the function signature for run_tpr_simulation to set data = NULL and protein =
NULL (or alternatively update the example to supply minimal mock data/protein or
wrap the example in \dontrun{}), and ensure any internal code in
run_tpr_simulation (e.g., checks or operations that assume non-NULL
data/protein) handles NULL appropriately so the examples run without error.
Transfer the TPR simulation function from MSstatsShiny and export it for external use
Motivation and Context
This PR ports a TPR (True Positive Rate) simulation workflow from MSstatsShiny into MSstatsResponse and exposes it as a reusable public API. Motivation: provide programmatic, testable simulation of detection power across replicate counts and dose configurations plus an interactive visualization. Solution: implement concentration-ladder generation, orchestrate many simulations (via futureExperimentSimulation), expose run_tpr_simulation() and plot_tpr_power_curve(), add docs and unit tests.
Detailed Changes
DESCRIPTION
NAMESPACE / Public API
New file R/TPR_Power_Curve.R
R/ExperimentalDesignSimulation.R
Documentation
Unit Tests Added / Modified
Added tests/testthat/test-TPR_Power_Curve.R
Updated tests/testthat/test-ExperimentalDesignSimulation.R
Coding Guidelines / Policy Notes (violations or breaking changes)