Merge TPR simulation function into Development by swaraj-neu · Pull Request #6 · Vitek-Lab/MSstatsResponse

swaraj-neu · 2026-03-20T20:50:39Z

Transfer the TPR simulation function from MSstatsShiny and export it for external use

Motivation and Context

This PR ports a TPR (True Positive Rate) simulation workflow from MSstatsShiny into MSstatsResponse and exposes it as a reusable public API. Motivation: provide programmatic, testable simulation of detection power across replicate counts and dose configurations plus an interactive visualization. Solution: implement concentration-ladder generation, orchestrate many simulations (via futureExperimentSimulation), expose run_tpr_simulation() and plot_tpr_power_curve(), add docs and unit tests.

Detailed Changes

DESCRIPTION
- Added plotly to Imports (plotly becomes a runtime dependency).
NAMESPACE / Public API
- Exported run_tpr_simulation(rep_range, concentrations, dose_range, data, protein, n_proteins = 1000).
- Exported plot_tpr_power_curve(simulation_results).
- Added imports: dplyr::if_else, ggplot2::theme_bw, grDevices::colorRampPalette, plotly::ggplotly, plotly::layout.
New file R/TPR_Power_Curve.R
- .build_concentration_ladders(concentrations, dose_range): sorts/deduplicates concentrations, requires at least two unique values including control (0), always includes min (0) and max, greedily selects intermediate doses by maximal distance in log10(conc + 1) from current subset median, returns named list keyed by subset size; errors if no subsets possible.
- .run_one_tpr_simulation(n_rep, concs, n_proteins, data, protein, seed=123): deterministic seeding, constructs args for futureExperimentSimulation (N_proteins, N_rep, Concentrations, IC50_Prediction = FALSE), passes data and strong_proteins, filters Hit_Rates_Data to Category == "TPR (Strong)" and maps Interaction to "Strong"/"Weak", returns Interaction / TPR (Percent) / N_rep / NumConcs.
- run_tpr_simulation(rep_range, concentrations, dose_range, data, protein, n_proteins=1000): validates rep_range and dose_range (length-2 numeric, min<=max, dose_range[1] >= 2), builds concentration ladders, expands grid of N_rep × NumConcs, runs .run_one_tpr_simulation() for each config with tryCatch, aggregates successful runs into a data.frame; errors if all runs fail.
- plot_tpr_power_curve(simulation_results): filters results to Interaction == "Strong", computes x breaks from sorted unique NumConcs, creates ggplot panel (TPR vs NumConcs) colored by N_rep with light→dark gradient, converts to interactive plotly object and applies annotation/layout tweaks.
R/ExperimentalDesignSimulation.R
- .getMeanProfile(): replaced numeric rounding-based grouping with string-based dose grouping and nearest-neighbor/interpolation logic to align observed mean LogIntensities to requested concentrations; added guard to Stop when a non-empty profile contains no non-zero doses.
- (Note: prior behavior/fallback to external template file was removed per commit messages—template fallback behavior changed to use baseline.)
Documentation
- Added man/run_tpr_simulation.Rd (usage, parameters, behavior, examples).
- Added man/plot_tpr_power_curve.Rd (usage, parameters, return value, example).
- Updated man/DoseResponseFit.Rd example: filter uses adj.pvalue rather than adjust_pval.

Unit Tests Added / Modified

Added tests/testthat/test-TPR_Power_Curve.R
- Tests .build_concentration_ladders: verifies expected subset counts and names, that each subset includes control (0) and max dose, errors on <2 unique concentrations, errors when 0 absent, and ensures subset elements come from input concentrations.
- run_tpr_simulation argument validation: errors for reversed or non-numeric rep_range; errors for invalid/reversed dose_range and insufficient bounds.
- run_tpr_simulation output checks: returns data.frame with columns Interaction, TPR, N_rep, NumConcs; Interaction restricted to "Strong" (tests filter behavior); TPR within [0,100].
- plot_tpr_power_curve returns an object inheriting from plotly when given simulation results.
Updated tests/testthat/test-ExperimentalDesignSimulation.R
- Adjusted interpolation/missing-dose test to expect endpoint 3000 and exact dose vector c(0,10,100,1000,3000).
- Expanded empty-template tests to assert templates for weak_interaction and no_interaction have 2 rows and required columns, and verify ratio values.

Coding Guidelines / Policy Notes (violations or breaking changes)

API stability: run_tpr_simulation now requires data and protein positional arguments (no longer optional/defaulting to NULL). This is a breaking change for callers that previously relied on data = NULL and protein = NULL defaults.
Behavior change: template fallback logic was changed/removed (commit notes indicate removed template1.RDS fallback and use baseline), which alters external behavior for cases that previously relied on that fallback.
Error handling: stricter failure modes introduced (Stop is used instead of warnings when no non-zero dose is present and when all simulations fail), which may cause immediate errors where previous behavior only warned.
Dependency addition: plotly added as a hard runtime dependency; callers/environments must have plotly available.

…for external use

coderabbitai · 2026-03-20T20:51:33Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds end-to-end TPR power-curve simulation and interactive Plotly plotting, implements concentration-ladder construction and per-configuration simulation, updates experiment mean-profile dose matching, adjusts package Imports/NAMESPACE, and adds documentation plus unit tests for the new features.

Changes

Cohort / File(s)	Summary
TPR Power-Curve Implementation `R/TPR_Power_Curve.R`	New module providing `.build_concentration_ladders()`, `.run_one_tpr_simulation()`, exported `run_tpr_simulation()` and exported `plot_tpr_power_curve()` (ggplot → plotly conversion).
Package Metadata & NAMESPACE `DESCRIPTION`, `NAMESPACE`	Added `plotly` to `Imports`; exported `run_tpr_simulation` and `plot_tpr_power_curve`; added imports (`if_else`, `theme_bw`, `colorRampPalette`, `ggplotly`, `layout`).
Documentation `man/run_tpr_simulation.Rd`, `man/plot_tpr_power_curve.Rd`, `man/DoseResponseFit.Rd`	Added man pages for `run_tpr_simulation` and `plot_tpr_power_curve`; corrected example column name in `DoseResponseFit.Rd`.
Experimental Design `R/ExperimentalDesignSimulation.R`	Changed `.getMeanProfile()` to string-based dose grouping, assign `dose_numeric` from first group, join by stringified doses and fill missing LogIntensities via interpolation/NN; added guard for missing non-zero doses.
Tests `tests/testthat/test-TPR_Power_Curve.R`, `tests/testthat/test-ExperimentalDesignSimulation.R`	Added tests for ladder generation, `run_tpr_simulation` validation/output structure, plotting returns plotly object; adjusted experimental-design tests to match new dose templates and additional assertions.

Sequence Diagram

sequenceDiagram
    participant User
    participant Runner as run_tpr_simulation
    participant Ladder as .build_concentration_ladders
    participant OneSim as .run_one_tpr_simulation
    participant Simulator as futureExperimentSimulation
    participant Plotter as plot_tpr_power_curve
    participant GG as ggplot2
    participant Plotly as plotly

    User->>Runner: call(rep_range, concentrations, dose_range, data, protein, ...)
    Runner->>Ladder: generate concentration subsets
    Ladder-->>Runner: concentration ladders
    Runner->>OneSim: iterate combos (N_rep, NumConcs)
    OneSim->>Simulator: simulate experiment (data, params)
    Simulator-->>OneSim: Hit_Rates_Data
    OneSim-->>Runner: filtered TPR rows (Interaction, TPR, N_rep, NumConcs)
    Runner-->>User: combined simulation results (data.frame)

    User->>Plotter: call(simulation_results)
    Plotter->>GG: build ggplot panel
    GG-->>Plotter: ggplot object
    Plotter->>Plotly: ggplotly + layout
    Plotly-->>Plotter: interactive plot
    Plotter-->>User: plotly visualization

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 Hopping doses in tidy rows,

I pick ladders where the log-median goes,
I seed simulations, count TPR light,
I draw curves that glow from faint to bright,
A little rabbit cheers the plots tonight.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Merge TPR simulation function into Development' accurately reflects the main change: migrating and exporting a TPR simulation function from MSstatsShiny into this repository, making it publicly available.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/tpr-function

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

R/TPR_Power_Curve.R (1)

34-46: Consider simplifying by removing the intermediate sim_args list.

The sim_args list is created and then immediately unpacked in the function call. This adds verbosity without benefit.

♻️ Simplified version

   run_one <- function(n_rep, k_conc, seed = 123) {
     set.seed(seed + n_rep * 100 + k_conc)
     concs_k <- CONC_MAP[[as.character(k_conc)]]
 
-    sim_args <- list(
-      N_proteins = n_proteins,
-      N_rep = n_rep,
-      Concentrations = concs_k,
-      IC50_Prediction = FALSE
-    )
-
-    temp_res <- futureExperimentSimulation(
-      N_proteins = sim_args$N_proteins,
-      N_rep = sim_args$N_rep,
-      Concentrations = sim_args$Concentrations,
-      IC50_Prediction = sim_args$IC50_Prediction
-    )
+    temp_res <- futureExperimentSimulation(
+      N_proteins = n_proteins,
+      N_rep = n_rep,
+      Concentrations = concs_k,
+      IC50_Prediction = FALSE
+    )
     temp_res$Hit_Rates_Data |>

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 34 - 46, The sim_args list is unnecessary
boilerplate: remove the sim_args variable and call futureExperimentSimulation
directly with the original variables (n_proteins, n_rep, concs_k, and FALSE) by
passing them to the corresponding parameters N_proteins, N_rep, Concentrations,
and IC50_Prediction in the futureExperimentSimulation call so you only use the
function name futureExperimentSimulation and the original symbols n_proteins,
n_rep, concs_k, and FALSE.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/TPR_Power_Curve.R`:
- Around line 91-92: The linetype mapping currently uses a fixed vector ltypes
and assigns ltype_values <- setNames(ltypes[seq_along(rep_levels)],
as.character(rep_levels)), which will produce NAs when length(rep_levels) >
length(ltypes); change the assignment to repeat or recycle ltypes to cover all
replicate levels (e.g., generate a vector of length length(rep_levels) by
repeating ltypes with length.out = length(rep_levels) or using rep(ltypes,
length.out = length(rep_levels))) and then call setNames(...) so ltype_values
maps every element of rep_levels to a valid linetype; refer to the variables
ltypes, ltype_values and rep_levels when making the change.
- Around line 26-28: The function run_tpr_simulation currently assumes rep_range
is a two-element integer vector; add input validation at the top of
run_tpr_simulation to check that rep_range is a length-2 numeric (or integer)
vector with no NA/NaN/Inf values, both elements are whole numbers (or can be
safely coerced to integers), and rep_range[1] <= rep_range[2]; if any check
fails, stop with a clear error message mentioning rep_range and expected form
(e.g., "rep_range must be a length-2 integer vector with min <= max"). After
validation, coerce to integers (if needed) before creating rep_grid to avoid
downstream surprises when using rep_grid.

---

Nitpick comments:
In `@R/TPR_Power_Curve.R`:
- Around line 34-46: The sim_args list is unnecessary boilerplate: remove the
sim_args variable and call futureExperimentSimulation directly with the original
variables (n_proteins, n_rep, concs_k, and FALSE) by passing them to the
corresponding parameters N_proteins, N_rep, Concentrations, and IC50_Prediction
in the futureExperimentSimulation call so you only use the function name
futureExperimentSimulation and the original symbols n_proteins, n_rep, concs_k,
and FALSE.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b0cfe25-885d-428f-890c-886741a75f85

📥 Commits

Reviewing files that changed from the base of the PR and between 8739ce4 and 5a26bdc.

📒 Files selected for processing (10)

DESCRIPTION
NAMESPACE
R/TPR_Power_Curve.R
man/ConvertGroupToNumericDose.Rd
man/DoseResponseFit.Rd
man/FutureExperimentSimulation.Rd
man/PredictIC50Parallel.Rd
man/VisualizeResponseProtein.Rd
man/plot_tpr_power_curve.Rd
man/run_tpr_simulation.Rd

💤 Files with no reviewable changes (5)

man/ConvertGroupToNumericDose.Rd
man/VisualizeResponseProtein.Rd
man/PredictIC50Parallel.Rd
man/DoseResponseFit.Rd
man/FutureExperimentSimulation.Rd

R/TPR_Power_Curve.R

tonywu1999 · 2026-03-22T22:09:24Z

R/TPR_Power_Curve.R

+      IC50_Prediction = FALSE
+    )
+
+    temp_res <- futureExperimentSimulation(


I'm a little confused. Users should also pass in the actual dataset and the Protein ID that is considered a strong interaction. But I don't see where this is being passed to the function.

tonywu1999 · 2026-03-22T22:10:50Z

R/TPR_Power_Curve.R

+  "8" = c(0, 1, 10, 30, 100, 300, 1000, 3000),
+  "9" = c(0, 1, 3, 10, 30, 100, 300, 1000, 3000)
+)
+


As discussed with Sarah, this should not be hard coded but rather each subsequent value should be picked based on farthest distance from the log(median) among a user's set of doses, not this hard-coded list.

A user will have any arbitrary set of doses as well (i.e. it's not just 0, 1, 3, 10. You might have 0, 2, 4, 20, ...)

This also means that another input parameter should be the concentrations themselves (e.g. 0,1,3,10,30,100,300,1000,3000) and the dose_range (e.g. c(2,9) being doses 2 through 9).

tonywu1999 · 2026-03-22T22:16:52Z

R/TPR_Power_Curve.R

+    stop("rep_range must be a numeric vector of length 2 with c(min, max) where min <= max.")
+  }
+  if (rep_range[2] > 5) {
+    stop("Maximum replicates is 5 (limited by available line styles for plotting).")


(limited by available line styles for plotting) - if a user sees this, it's not going to make sense. You can remove this part.

I don't think there needs to be a maximum replicate range in this function. It should go inside the plotting function since that's where the plotting limitation

R/TPR_Power_Curve.R

tonywu1999 · 2026-03-22T22:19:11Z

R/TPR_Power_Curve.R

+#' Run TPR simulation across a grid of concentration counts and replicate counts
+#'
+#' Sweeps over combinations of dose counts (2-9) and replicate counts,
+#' calling \code{futureExperimentSimulation()} for each combination.


People don't know what TPR simulation means or what it means to sweep over dose counts / replicate counts calling this futureExperimentSimulation function.

Can you be more thorough in a way so that a biologist with minimal coding experience would understand what this function does?

tonywu1999 · 2026-03-22T22:26:02Z

R/TPR_Power_Curve.R

+#'
+#' @importFrom dplyr filter mutate select if_else
+#' @export
+run_tpr_simulation <- function(rep_range, n_proteins = 1000) {


There should be unit tests for this function

tonywu1999 · 2026-03-22T22:28:48Z

R/TPR_Power_Curve.R

+  if (!is.numeric(rep_range) || length(rep_range) != 2 || rep_range[1] > rep_range[2]) {
+    stop("rep_range must be a numeric vector of length 2 with c(min, max) where min <= max.")
+  }
+  if (rep_range[2] > 5) {


Shouldn't this validation be for the difference between max and min? Not the max replicate value.

tonywu1999 · 2026-03-22T22:33:40Z

R/TPR_Power_Curve.R

+  if (!requireNamespace("plotly", quietly = TRUE)) {
+    stop("Package 'plotly' is required for interactive plots. Please install it.")


We don't need this because plotly should already be a dependency of this package, meaning users will have it automatically installed when they install MSstatsResponse

tonywu1999 · 2026-03-22T22:34:22Z

R/TPR_Power_Curve.R

+  }
+  ltype_values <- setNames(ltypes[seq_along(rep_levels)], as.character(rep_levels))
+
+  make_panel <- function(data, color, show_legend = FALSE) {


Put this outside of the plotting function instead of inside it and rename it so that it's clear what it's doing, aka plotting the TPR curves using ggplot2/plotly.

tonywu1999 · 2026-03-22T22:36:32Z

man/VisualizeResponseProtein.Rd

@@ -1,93 +0,0 @@
-% Generated by roxygen2: do not edit by hand


All of the documentation files in the man folder should NOT be deleted. Can you fix it so that devtools::document() does not remove these files?

tonywu1999 · 2026-03-23T12:52:47Z

R/TPR_Power_Curve.R

+  k_grid <- sort(unique(simulation_results$NumConcs))
+  rep_levels <- sort(unique(simulation_results$N_rep))
+
+  ltypes <- c("dotted", "dotdash", "dashed", "longdash", "solid")


On second thought - I thought we discussed doing a color gradient instead? In that case, 5 replicate validation is not needed either.

- Replace with actual dataset and protein ID - Replace hard-coded concentrations with a mechanism to pick based on farthest distance from the log(median) - Replace the linestyles with colot palettes for plot representation and removed maximum replicate rage - Provide examples for TPR simulation usage - Re-write the comments in a biologist friendly tone - Extract the single TPR simulation logic into a helper function - Improve the plotly import method - Write unit test cases for the plotly function - Move plotly from suggested into required dependency and remove the installation check from code

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

R/ExperimentalDesignSimulation.R (1)
249-254: Potential division by zero or NaN when all profile doses are zero.

If the user's data contains only control measurements (all dose_numeric == 0), the condition at line 249 (any(profile$dose_numeric != 0)) will be FALSE, leaving LogIntensities as NA for non-zero target concentrations. The subsequent fill loop (lines 258-305) handles this, but it's worth noting that this relies on the fallback behavior.

This is acceptable given the fallback logic, but consider adding a warning when no non-zero doses exist in the profile data.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/ExperimentalDesignSimulation.R` around lines 249 - 254, The code currently
skips filling LogIntensities when a profile has no non-zero doses (the else
branch when any(profile$dose_numeric != 0) is FALSE), which can leave NA values
relying on later fallback; add an explicit check for the all-zero-doses case
(using profile$dose_numeric) and emit a clear warning (e.g., via warning()) that
this profile contains only control measurements so non-zero target
concentrations cannot be matched, referencing the profile identifier (or row
index) and the object names used here (profile, non_zero,
complete_profile$LogIntensities) so the user can locate the problematic profile;
do not change the existing fallback logic, only add the warning in the branch
where no non-zero doses exist.
tests/testthat/test-TPR_Power_Curve.R (1)
92-105: Consider adding a test for the data provided without protein case.

This test verifies that data=NULL, protein=NULL works, but doesn't test the case where data is provided without protein. Based on futureExperimentSimulation's validation, this should fail with "must specify at least one" error.
🧪 Proposed additional test
test_that("run_tpr_simulation errors when data provided without protein", {
  mock_data <- data.frame(
    protein = "P1",
    drug = c("DMSO", "Drug1"),
    dose = c(0, 1e-6),
    response = c(20, 18)
  )
  
  expect_error(
    run_tpr_simulation(
      rep_range = c(1, 1),
      concentrations = c(0, 100, 1000),
      dose_range = c(2, 3),
      n_proteins = 10,
      data = mock_data,
      protein = NULL
    ),
    "must specify"
  )
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/testthat/test-TPR_Power_Curve.R` around lines 92 - 105, Add a test that
calls run_tpr_simulation with a non-NULL data.frame but protein = NULL and
asserts it errors with the "must specify" message; create a small mock_data
(columns: protein, drug, dose, response) and wrap the call in expect_error(...,
"must specify") to mirror futureExperimentSimulation's validation and ensure
run_tpr_simulation surfaces the expected validation error.
R/TPR_Power_Curve.R (2)
76-78: Seed collision possible for different parameter combinations.

The seed formula seed + n_rep * 100 + length(concs) can produce identical seeds for different (n_rep, concs) pairs. For example, (n_rep=1, length=102) and (n_rep=2, length=2) both produce seed + 202.

Consider using a hash-based approach or including both values in a way that prevents collision:
♻️ Proposed fix for unique seeding
 .run_one_tpr_simulation <- function(n_rep, concs, n_proteins, data = NULL,
                                      protein = NULL, seed = 123) {
-  set.seed(seed + n_rep * 100 + length(concs))
+  # Use prime multiplier to reduce collision likelihood
+  set.seed(seed + n_rep * 1009 + length(concs) * 17)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 76 - 78, The current seed formula in
.run_one_tpr_simulation (seed + n_rep * 100 + length(concs)) can collide for
different (n_rep, concs) pairs; replace it with a collision-resistant seed
derived from hashing the combined parameters: compute a new_seed by hashing a
string built from seed, n_rep and the full concs vector (e.g., paste(seed,
n_rep, paste(concs, collapse=","))) using a stable hash (digest::digest with
xxhash32 or sha1), convert a portion of the hex hash to an integer
(strtoi(substr(...), 16L)) and use set.seed(new_seed); update
NAMESPACE/DESCRIPTION or imports to include digest if not already present.
284-299: Plot only shows "Strong" interaction results.

The function filters to Interaction == "Strong" at line 288, discarding "Weak" interaction data. The documentation at lines 260-261 mentions "user-selected protein template" but doesn't clarify that only strong interactions are visualized.

If this is intentional, consider updating the documentation. If users might want to see both interaction types, consider adding a parameter to control this.
📝 Documentation clarification
 #' Creates an interactive plot showing how the true positive rate
 #' (detection power) changes with the number of doses and replicates
-#' for the user-selected protein template. Line color shading goes
+#' for strong interactions only. Line color shading goes
 #' from light (fewest replicates) to dark (most replicates).
+#'
+#' Note: This function currently visualizes only the "Strong" interaction
+#' category. Weak interaction results from \code{run_tpr_simulation} are
+#' not displayed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 284 - 299, The function
plot_tpr_power_curve currently hard-filters simulation_results to Interaction ==
"Strong" (results_protein), which drops "Weak" data; either make this behavior
configurable by adding an argument (e.g., interaction_type or
include_interactions) to plot_tpr_power_curve and use that parameter when
subsetting simulation_results, or if the plot must remain Strong-only, update
the function documentation and comment to state explicitly that only "Strong"
interactions are plotted; locate the filtering logic around results_protein and
the function signature plot_tpr_power_curve to implement the parameter or adjust
the docs accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/ExperimentalDesignSimulation.R`:
- Around line 336-349: The default_template filtering using exact matching
(default_template$weak_interaction[default_template$weak_interaction$dose %in%
concentrations, ] and similarly for default_template$no_interaction) can return
zero rows for non-standard concentrations; change the logic in the
weak_interaction and no_interaction branches to first attempt the exact filter
and if the result has zero rows fall back to calling .getMeanProfile(NULL,
drug_data, concentrations) (or otherwise map concentrations to nearest-template
doses) so that the returned profile always has length(concentrations),
referencing weak_interaction, no_interaction, default_template, .getMeanProfile
and concentrations to locate the code to modify.

In `@R/TPR_Power_Curve.R`:
- Around line 86-91: When building sim_args in TPR_Power_Curve.R, ensure we
validate the dependency between data and protein: if data is provided
(sim_args$data set) but protein is NULL, either require the caller to supply
protein or provide a clear error; add a check after setting sim_args$data that
stops with a descriptive message (referencing futureExperimentSimulation's
requirement) if protein is NULL, or alternatively set
sim_args$strong_proteins/weak_proteins/no_interaction_proteins appropriately;
update the validation around sim_args and the parameter named protein so
futureExperimentSimulation won't error unexpectedly.

---

Nitpick comments:
In `@R/ExperimentalDesignSimulation.R`:
- Around line 249-254: The code currently skips filling LogIntensities when a
profile has no non-zero doses (the else branch when any(profile$dose_numeric !=
0) is FALSE), which can leave NA values relying on later fallback; add an
explicit check for the all-zero-doses case (using profile$dose_numeric) and emit
a clear warning (e.g., via warning()) that this profile contains only control
measurements so non-zero target concentrations cannot be matched, referencing
the profile identifier (or row index) and the object names used here (profile,
non_zero, complete_profile$LogIntensities) so the user can locate the
problematic profile; do not change the existing fallback logic, only add the
warning in the branch where no non-zero doses exist.

In `@R/TPR_Power_Curve.R`:
- Around line 76-78: The current seed formula in .run_one_tpr_simulation (seed +
n_rep * 100 + length(concs)) can collide for different (n_rep, concs) pairs;
replace it with a collision-resistant seed derived from hashing the combined
parameters: compute a new_seed by hashing a string built from seed, n_rep and
the full concs vector (e.g., paste(seed, n_rep, paste(concs, collapse=",")))
using a stable hash (digest::digest with xxhash32 or sha1), convert a portion of
the hex hash to an integer (strtoi(substr(...), 16L)) and use
set.seed(new_seed); update NAMESPACE/DESCRIPTION or imports to include digest if
not already present.
- Around line 284-299: The function plot_tpr_power_curve currently hard-filters
simulation_results to Interaction == "Strong" (results_protein), which drops
"Weak" data; either make this behavior configurable by adding an argument (e.g.,
interaction_type or include_interactions) to plot_tpr_power_curve and use that
parameter when subsetting simulation_results, or if the plot must remain
Strong-only, update the function documentation and comment to state explicitly
that only "Strong" interactions are plotted; locate the filtering logic around
results_protein and the function signature plot_tpr_power_curve to implement the
parameter or adjust the docs accordingly.

In `@tests/testthat/test-TPR_Power_Curve.R`:
- Around line 92-105: Add a test that calls run_tpr_simulation with a non-NULL
data.frame but protein = NULL and asserts it errors with the "must specify"
message; create a small mock_data (columns: protein, drug, dose, response) and
wrap the call in expect_error(..., "must specify") to mirror
futureExperimentSimulation's validation and ensure run_tpr_simulation surfaces
the expected validation error.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fb2bd13d-17ca-4496-9c82-a9a402fcfaf4

📥 Commits

Reviewing files that changed from the base of the PR and between 8b7eaf9 and b1ed032.

📒 Files selected for processing (9)

DESCRIPTION
NAMESPACE
R/ExperimentalDesignSimulation.R
R/TPR_Power_Curve.R
man/DoseResponseFit.Rd
man/plot_tpr_power_curve.Rd
man/run_tpr_simulation.Rd
tests/testthat/test-ExperimentalDesignSimulation.R
tests/testthat/test-TPR_Power_Curve.R

✅ Files skipped from review due to trivial changes (3)

man/DoseResponseFit.Rd
man/plot_tpr_power_curve.Rd
man/run_tpr_simulation.Rd

🚧 Files skipped from review as they are similar to previous changes (2)

DESCRIPTION
NAMESPACE

R/ExperimentalDesignSimulation.R

R/TPR_Power_Curve.R

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/TPR_Power_Curve.R`:
- Around line 25-27: The code currently sets control <- min(concentrations),
which contradicts the docstring that control must be 0; change this to
explicitly require/control for 0: validate that 0 is present in the
concentrations vector at the start of the function (stop with an informative
error if not), then set control <- 0 (keep highest <- max(concentrations) and
compute candidates <- setdiff(concentrations, c(control, highest))). Use the
variable names concentrations, control, highest, and candidates to locate and
modify the logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e78deb24-c9be-4966-a4b6-7fbb124766bf

📥 Commits

Reviewing files that changed from the base of the PR and between b1ed032 and cfabb47.

📒 Files selected for processing (4)

R/ExperimentalDesignSimulation.R
R/TPR_Power_Curve.R
man/plot_tpr_power_curve.Rd
tests/testthat/test-TPR_Power_Curve.R

✅ Files skipped from review due to trivial changes (2)

man/plot_tpr_power_curve.Rd
tests/testthat/test-TPR_Power_Curve.R

R/TPR_Power_Curve.R

tonywu1999 · 2026-03-31T16:44:54Z

R/ExperimentalDesignSimulation.R

+    for (i in seq_len(nrow(complete_profile))) {
+      target = complete_profile$dose[i]
+      if (target == 0) {
+        # Control: match any zero-dose entry
+        zero_match = profile[profile$dose_numeric == 0, ]
+        if (nrow(zero_match) > 0) {
+          complete_profile$LogIntensities[i] = zero_match$LogIntensities[1]
+        }
+      } else if (any(profile$dose_numeric != 0)) {
+        # Non-zero: find closest on log scale
+        non_zero = profile[profile$dose_numeric != 0, ]
+        log_ratios = abs(log10(non_zero$dose_numeric) - log10(target))
+        best = which.min(log_ratios)
+        complete_profile$LogIntensities[i] = non_zero$LogIntensities[best]
+      }
+    }


I think this loop is overly complex. Shouldn't the dose_str column of the profile variable match the string version of the complete_profile$dose column? This could be a simple left_join if you have a string version of the dose column in complete_profile.

tonywu1999 · 2026-03-31T16:47:09Z

R/ExperimentalDesignSimulation.R

-    complete_profile = complete_profile %>%
-      left_join(profile, by = "dose")
+    if (nrow(profile) > 0 && !any(profile$dose_numeric != 0)) {
+      warning("No non-zero dose measurements found for the selected protein. Template will use fallback values.")


I think this should be a stop instead of a warning. It doesn't make sense to be able to make simulations for sample size calculation if your dataset has only dose 0.

tonywu1999 · 2026-03-31T16:55:34Z

R/ExperimentalDesignSimulation.R

+  default_template <- NULL
+  if (is.null(weak_proteins) || is.null(no_interaction_proteins)) {
+    template1_path <- system.file("extdata", "template1.RDS", package = "MSstatsResponse")
+    if (file.exists(template1_path)) {
+      default_template <- readRDS(template1_path)
+    }
+  }
+


This seems to be a hard-coded template specific to the dataset you were given, but not extensible to other datasets. I think if a weak interaction / no interaction protein is not defined, we should fallback to giving the baseline profile.

tonywu1999 · 2026-03-31T16:58:05Z

R/ExperimentalDesignSimulation.R

+    weak_interaction = if (!is.null(weak_proteins)) {
+      .getMeanProfile(weak_proteins, drug_data, concentrations)
+    } else if (!is.null(default_template)) {
+      default_template$weak_interaction[default_template$weak_interaction$dose %in% concentrations, ]
+    } else {
+      .getMeanProfile(NULL, drug_data, concentrations)
+    },
+    no_interaction = if (!is.null(no_interaction_proteins)) {
+      .getMeanProfile(no_interaction_proteins, drug_data, concentrations)
+    } else if (!is.null(default_template)) {
+      default_template$no_interaction[default_template$no_interaction$dose %in% concentrations, ]
+    } else {
+      .getMeanProfile(NULL, drug_data, concentrations)


Will anything break in the package if weak_interaction / no_interaction is set to the baseline profile (given we never provided any weak interaction / no interaction protein)?

tonywu1999 · 2026-03-31T17:05:11Z

R/ExperimentalDesignSimulation.R

+
+  # Extract templates: use user data for specified proteins, defaults for others
  template = list(
    strong_interaction = .getMeanProfile(strong_proteins, drug_data, concentrations),


I'm a little confused, what does .getMeanProfile return when you defined a non-null strong_proteins? Reading the function, it doesn't return anything.

When protein_ids is non-null, .getMeanProfile returns a data.frame with columns dose, Intensity, LogIntensities, ratio - the function filters drug_data for those proteins, groups by dose, computes mean LogIntensities, then fills missing doses via interpolation. The return is at the end of the function.

tonywu1999 · 2026-03-31T17:16:36Z

R/TPR_Power_Curve.R

+  result <- selection_order[as.character(k_range)]
+  result <- result[!sapply(result, is.null)]
+
+  if (length(result) == 0) {


Under what circumstances does this occur? Because it would be helpful to have validation earlier on in the function that this ladder can't be constructed, this error message makes it unclear what the user needs to do.

tonywu1999 · 2026-03-31T17:19:01Z

R/TPR_Power_Curve.R

+#' @param data Optional prepared dose-response data for user-based templates.
+#' @param protein Optional protein ID for strong interaction template.


I think these parameters should be required.

Previously, it appeared like the could be optional because we had the template.rds file in the package, but that file is specific to a dataset that Sarah had access to. I think privacy issues made it so that Sarah couldn't put the full dataset and run everything E2E in the package.

tonywu1999 · 2026-03-31T17:21:32Z

R/TPR_Power_Curve.R

+#' @noRd
+.run_one_tpr_simulation <- function(n_rep, concs, n_proteins, data = NULL,
+                                     protein = NULL, seed = 123) {
+  set.seed(seed + n_rep * 100 + length(concs))


I think we can just do set.seed(seed). Users can always adjust their seed number based on number of replicates / length of concentration

tonywu1999 · 2026-03-31T17:22:41Z

R/TPR_Power_Curve.R

+
+  temp_res <- do.call(futureExperimentSimulation, sim_args)
+  temp_res$Hit_Rates_Data |>
+    dplyr::filter(Category %in% c("TPR (Strong)", "TPR (Weak)")) |>


I think it's safe to just filter only "TPR (Strong)" here.

tonywu1999 · 2026-03-31T17:27:24Z

R/TPR_Power_Curve.R

+#' @param data Optional. User's prepared dose-response data (e.g., from
+#'   \code{MSstatsPrepareDoseResponseFit}). If provided, protein-specific
+#'   templates are extracted from this data instead of using defaults.
+#' @param protein Optional. Character string specifying a protein ID to use as
+#'   the strong interaction template. Only used when \code{data} is provided.


As mentioned in another comment, should be required

- Simplify the profile matching loop - Use Stop instead of warning if there are no non-zero dose - Remove template1.RDS fallback, use baseline - Enhance the message for not generating any valid concentration subsets - Make data and protein required and remove superfluous validations - Update corresponding docs for turning the optional fields mandatory - Simplify seed in one tpr simulation - Filter only TPR strong - Update the test cases

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

R/TPR_Power_Curve.R (2)

288-304: Consider adding a guard for empty results.

If simulation_results contains no rows where Interaction == "Strong" (e.g., due to upstream data issues), results_protein will be empty and the plot will render without data or error message. This edge case is unlikely given the current pipeline but could be confusing if encountered.

🛡️ Optional: Add validation for empty data

 plot_tpr_power_curve <- function(simulation_results) {
   k_grid <- sort(unique(simulation_results$NumConcs))

   # Use only the Strong interaction results (user-selected protein template)
   results_protein <- simulation_results[simulation_results$Interaction == "Strong", ]
+
+  if (nrow(results_protein) == 0) {
+    stop("No 'Strong' interaction results found in simulation_results.")
+  }

   p <- .make_tpr_panel(results_protein, "#a6dba0", "#1b7837", k_grid, TRUE)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 288 - 304, Add a guard in
plot_tpr_power_curve to handle the case where results_protein
(simulation_results filtered by Interaction == "Strong") is empty: detect if
results_protein has zero rows and either abort with a clear message (e.g., stop
or warning) or return a simple empty/placeholder plotly object with an
explanatory annotation, instead of calling .make_tpr_panel with empty data;
reference results_protein, plot_tpr_power_curve and .make_tpr_panel to locate
where to insert the check and subsequent early return.

97-104: Simplify redundant if_else to direct assignment.

Since line 98 filters to only Category == "TPR (Strong)", the if_else at line 102 will always evaluate to "Strong". The "Weak" branch is dead code.

♻️ Proposed simplification

   temp_res$Hit_Rates_Data |>
     dplyr::filter(Category == "TPR (Strong)") |>
     dplyr::mutate(
       N_rep = n_rep,
       NumConcs = length(concs),
-      Interaction = dplyr::if_else(Category == "TPR (Strong)", "Strong", "Weak")
+      Interaction = "Strong"
     ) |>
     dplyr::select(Interaction, TPR = Percent, N_rep, NumConcs)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 97 - 104, The mutate call on
temp_res$Hit_Rates_Data sets Interaction using dplyr::if_else but the preceding
dplyr::filter(Category == "TPR (Strong)") ensures Category is always "TPR
(Strong)", so replace the conditional with a direct assignment: in the
dplyr::mutate that defines Interaction (within the pipeline starting at
temp_res$Hit_Rates_Data |> ...), set Interaction = "Strong" instead of using
dplyr::if_else to remove the dead "Weak" branch and simplify the expression.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/TPR_Power_Curve.R`:
- Around line 148-156: The example fails because run_tpr_simulation requires
arguments data and protein with no defaults; update the function signature for
run_tpr_simulation to set data = NULL and protein = NULL (or alternatively
update the example to supply minimal mock data/protein or wrap the example in
\dontrun{}), and ensure any internal code in run_tpr_simulation (e.g., checks or
operations that assume non-NULL data/protein) handles NULL appropriately so the
examples run without error.

---

Nitpick comments:
In `@R/TPR_Power_Curve.R`:
- Around line 288-304: Add a guard in plot_tpr_power_curve to handle the case
where results_protein (simulation_results filtered by Interaction == "Strong")
is empty: detect if results_protein has zero rows and either abort with a clear
message (e.g., stop or warning) or return a simple empty/placeholder plotly
object with an explanatory annotation, instead of calling .make_tpr_panel with
empty data; reference results_protein, plot_tpr_power_curve and .make_tpr_panel
to locate where to insert the check and subsequent early return.
- Around line 97-104: The mutate call on temp_res$Hit_Rates_Data sets
Interaction using dplyr::if_else but the preceding dplyr::filter(Category ==
"TPR (Strong)") ensures Category is always "TPR (Strong)", so replace the
conditional with a direct assignment: in the dplyr::mutate that defines
Interaction (within the pipeline starting at temp_res$Hit_Rates_Data |> ...),
set Interaction = "Strong" instead of using dplyr::if_else to remove the dead
"Weak" branch and simplify the expression.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bc5f754e-c855-4899-8011-be5b09124d28

📥 Commits

Reviewing files that changed from the base of the PR and between cfabb47 and bf82171.

📒 Files selected for processing (5)

R/ExperimentalDesignSimulation.R
R/TPR_Power_Curve.R
man/run_tpr_simulation.Rd
tests/testthat/test-ExperimentalDesignSimulation.R
tests/testthat/test-TPR_Power_Curve.R

✅ Files skipped from review due to trivial changes (3)

R/ExperimentalDesignSimulation.R
man/run_tpr_simulation.Rd
tests/testthat/test-TPR_Power_Curve.R

🚧 Files skipped from review as they are similar to previous changes (1)

tests/testthat/test-ExperimentalDesignSimulation.R

coderabbitai · 2026-04-03T04:10:10Z

R/TPR_Power_Curve.R

+#' @examples
+#' # Quick example with small simulation (for speed)
+#' results <- run_tpr_simulation(
+#'   rep_range = c(1, 2),
+#'   concentrations = c(0, 10, 100, 1000),
+#'   dose_range = c(2, 4),
+#'   n_proteins = 50
+#' )
+#' head(results)


⚠️ Potential issue | 🟡 Minor

Documentation example will fail: data and protein are required but not provided.

The function signature requires data and protein (no default values), but the "Quick example" omits them. Running this example will error:

Error in run_tpr_simulation(rep_range = c(1, 2), ...) : argument "data" is missing, with no default

Either add data = NULL, protein = NULL defaults to the function signature, or update the example to include placeholder/mock data.

🛡️ Option 1: Make parameters optional with NULL defaults

-run_tpr_simulation <- function(rep_range, concentrations, dose_range, - data, protein, n_proteins = 1000) { +run_tpr_simulation <- function(rep_range, concentrations, dose_range, + data = NULL, protein = NULL, n_proteins = 1000) {

📝 Option 2: Wrap the example in \dontrun{}

-#' # Quick example with small simulation (for speed) -#' results <- run_tpr_simulation( +#' \dontrun{ +#' # Example with user-provided data +#' results <- run_tpr_simulation( #' rep_range = c(1, 2), #' concentrations = c(0, 10, 100, 1000), #' dose_range = c(2, 4), +#' data = my_prepared_data, +#' protein = "P12345", #' n_proteins = 50 #' ) -#' head(results) +#' }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#' @examples

#' # Quick example with small simulation (for speed)

#' results <- run_tpr_simulation(

#' rep_range = c(1, 2),

#' concentrations = c(0, 10, 100, 1000),

#' dose_range = c(2, 4),

#' n_proteins = 50

#' )

#' head(results)

#' `@examples`

#' \dontrun{

#' # Example with user-provided data

#' results <- run_tpr_simulation(

#' rep_range = c(1, 2),

#' concentrations = c(0, 10, 100, 1000),

#' dose_range = c(2, 4),

#' data = my_prepared_data,

#' protein = "P12345",

#' n_proteins = 50

#' )

#' }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@R/TPR_Power_Curve.R` around lines 148 - 156, The example fails because run_tpr_simulation requires arguments data and protein with no defaults; update the function signature for run_tpr_simulation to set data = NULL and protein = NULL (or alternatively update the example to supply minimal mock data/protein or wrap the example in \dontrun{}), and ensure any internal code in run_tpr_simulation (e.g., checks or operations that assume non-NULL data/protein) handles NULL appropriately so the examples run without error.

Transfer the TPR simulation function from MSstatsShiny and export it …

5a26bdc

…for external use

swaraj-neu requested a review from tonywu1999 March 20, 2026 20:50

swaraj-neu self-assigned this Mar 20, 2026

swaraj-neu added the enhancement New feature or request label Mar 20, 2026

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

R/TPR_Power_Curve.R Outdated Show resolved Hide resolved

R/TPR_Power_Curve.R Outdated Show resolved Hide resolved

Resolve TPR function nitpicks

8b7eaf9

tonywu1999 reviewed Mar 22, 2026

View reviewed changes

tonywu1999 reviewed Mar 23, 2026

View reviewed changes

swaraj-neu added 2 commits March 26, 2026 10:52

Rollback the documentation files from dev branch

5d87188

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

R/ExperimentalDesignSimulation.R Outdated Show resolved Hide resolved

R/TPR_Power_Curve.R Outdated Show resolved Hide resolved

Resolve nitpicks in TPR power curve function

cfabb47

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

R/TPR_Power_Curve.R Show resolved Hide resolved

Resolved a documentation nitpick in TPR simulation function

38a10d5

tonywu1999 reviewed Mar 31, 2026

View reviewed changes

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

Update the example in the docs for TPR simulation function

6236e60

		if (!requireNamespace("plotly", quietly = TRUE)) {
		stop("Package 'plotly' is required for interactive plots. Please install it.")

		@@ -1,93 +0,0 @@
		% Generated by roxygen2: do not edit by hand

		#' @param data Optional prepared dose-response data for user-based templates.
		#' @param protein Optional protein ID for strong interaction template.

Conversation

swaraj-neu commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Detailed Changes

Unit Tests Added / Modified

Coding Guidelines / Policy Notes (violations or breaking changes)

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

swaraj-neu commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading