Skip to content

Merge TPR simulation function into Development#6

Open
swaraj-neu wants to merge 8 commits intodevelfrom
feat/tpr-function
Open

Merge TPR simulation function into Development#6
swaraj-neu wants to merge 8 commits intodevelfrom
feat/tpr-function

Conversation

@swaraj-neu
Copy link
Copy Markdown

@swaraj-neu swaraj-neu commented Mar 20, 2026

Transfer the TPR simulation function from MSstatsShiny and export it for external use

Motivation and Context

This PR ports a TPR (True Positive Rate) simulation workflow from MSstatsShiny into MSstatsResponse and exposes it as a reusable public API. Motivation: provide programmatic, testable simulation of detection power across replicate counts and dose configurations plus an interactive visualization. Solution: implement concentration-ladder generation, orchestrate many simulations (via futureExperimentSimulation), expose run_tpr_simulation() and plot_tpr_power_curve(), add docs and unit tests.

Detailed Changes

  • DESCRIPTION

    • Added plotly to Imports (plotly becomes a runtime dependency).
  • NAMESPACE / Public API

    • Exported run_tpr_simulation(rep_range, concentrations, dose_range, data, protein, n_proteins = 1000).
    • Exported plot_tpr_power_curve(simulation_results).
    • Added imports: dplyr::if_else, ggplot2::theme_bw, grDevices::colorRampPalette, plotly::ggplotly, plotly::layout.
  • New file R/TPR_Power_Curve.R

    • .build_concentration_ladders(concentrations, dose_range): sorts/deduplicates concentrations, requires at least two unique values including control (0), always includes min (0) and max, greedily selects intermediate doses by maximal distance in log10(conc + 1) from current subset median, returns named list keyed by subset size; errors if no subsets possible.
    • .run_one_tpr_simulation(n_rep, concs, n_proteins, data, protein, seed=123): deterministic seeding, constructs args for futureExperimentSimulation (N_proteins, N_rep, Concentrations, IC50_Prediction = FALSE), passes data and strong_proteins, filters Hit_Rates_Data to Category == "TPR (Strong)" and maps Interaction to "Strong"/"Weak", returns Interaction / TPR (Percent) / N_rep / NumConcs.
    • run_tpr_simulation(rep_range, concentrations, dose_range, data, protein, n_proteins=1000): validates rep_range and dose_range (length-2 numeric, min<=max, dose_range[1] >= 2), builds concentration ladders, expands grid of N_rep × NumConcs, runs .run_one_tpr_simulation() for each config with tryCatch, aggregates successful runs into a data.frame; errors if all runs fail.
    • plot_tpr_power_curve(simulation_results): filters results to Interaction == "Strong", computes x breaks from sorted unique NumConcs, creates ggplot panel (TPR vs NumConcs) colored by N_rep with light→dark gradient, converts to interactive plotly object and applies annotation/layout tweaks.
  • R/ExperimentalDesignSimulation.R

    • .getMeanProfile(): replaced numeric rounding-based grouping with string-based dose grouping and nearest-neighbor/interpolation logic to align observed mean LogIntensities to requested concentrations; added guard to Stop when a non-empty profile contains no non-zero doses.
    • (Note: prior behavior/fallback to external template file was removed per commit messages—template fallback behavior changed to use baseline.)
  • Documentation

    • Added man/run_tpr_simulation.Rd (usage, parameters, behavior, examples).
    • Added man/plot_tpr_power_curve.Rd (usage, parameters, return value, example).
    • Updated man/DoseResponseFit.Rd example: filter uses adj.pvalue rather than adjust_pval.

Unit Tests Added / Modified

  • Added tests/testthat/test-TPR_Power_Curve.R

    • Tests .build_concentration_ladders: verifies expected subset counts and names, that each subset includes control (0) and max dose, errors on <2 unique concentrations, errors when 0 absent, and ensures subset elements come from input concentrations.
    • run_tpr_simulation argument validation: errors for reversed or non-numeric rep_range; errors for invalid/reversed dose_range and insufficient bounds.
    • run_tpr_simulation output checks: returns data.frame with columns Interaction, TPR, N_rep, NumConcs; Interaction restricted to "Strong" (tests filter behavior); TPR within [0,100].
    • plot_tpr_power_curve returns an object inheriting from plotly when given simulation results.
  • Updated tests/testthat/test-ExperimentalDesignSimulation.R

    • Adjusted interpolation/missing-dose test to expect endpoint 3000 and exact dose vector c(0,10,100,1000,3000).
    • Expanded empty-template tests to assert templates for weak_interaction and no_interaction have 2 rows and required columns, and verify ratio values.

Coding Guidelines / Policy Notes (violations or breaking changes)

  • API stability: run_tpr_simulation now requires data and protein positional arguments (no longer optional/defaulting to NULL). This is a breaking change for callers that previously relied on data = NULL and protein = NULL defaults.
  • Behavior change: template fallback logic was changed/removed (commit notes indicate removed template1.RDS fallback and use baseline), which alters external behavior for cases that previously relied on that fallback.
  • Error handling: stricter failure modes introduced (Stop is used instead of warnings when no non-zero dose is present and when all simulations fail), which may cause immediate errors where previous behavior only warned.
  • Dependency addition: plotly added as a hard runtime dependency; callers/environments must have plotly available.

@swaraj-neu swaraj-neu requested a review from tonywu1999 March 20, 2026 20:50
@swaraj-neu swaraj-neu self-assigned this Mar 20, 2026
@swaraj-neu swaraj-neu added the enhancement New feature or request label Mar 20, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds end-to-end TPR power-curve simulation and interactive Plotly plotting, implements concentration-ladder construction and per-configuration simulation, updates experiment mean-profile dose matching, adjusts package Imports/NAMESPACE, and adds documentation plus unit tests for the new features.

Changes

Cohort / File(s) Summary
TPR Power-Curve Implementation
R/TPR_Power_Curve.R
New module providing .build_concentration_ladders(), .run_one_tpr_simulation(), exported run_tpr_simulation() and exported plot_tpr_power_curve() (ggplot → plotly conversion).
Package Metadata & NAMESPACE
DESCRIPTION, NAMESPACE
Added plotly to Imports; exported run_tpr_simulation and plot_tpr_power_curve; added imports (if_else, theme_bw, colorRampPalette, ggplotly, layout).
Documentation
man/run_tpr_simulation.Rd, man/plot_tpr_power_curve.Rd, man/DoseResponseFit.Rd
Added man pages for run_tpr_simulation and plot_tpr_power_curve; corrected example column name in DoseResponseFit.Rd.
Experimental Design
R/ExperimentalDesignSimulation.R
Changed .getMeanProfile() to string-based dose grouping, assign dose_numeric from first group, join by stringified doses and fill missing LogIntensities via interpolation/NN; added guard for missing non-zero doses.
Tests
tests/testthat/test-TPR_Power_Curve.R, tests/testthat/test-ExperimentalDesignSimulation.R
Added tests for ladder generation, run_tpr_simulation validation/output structure, plotting returns plotly object; adjusted experimental-design tests to match new dose templates and additional assertions.

Sequence Diagram

sequenceDiagram
    participant User
    participant Runner as run_tpr_simulation
    participant Ladder as .build_concentration_ladders
    participant OneSim as .run_one_tpr_simulation
    participant Simulator as futureExperimentSimulation
    participant Plotter as plot_tpr_power_curve
    participant GG as ggplot2
    participant Plotly as plotly

    User->>Runner: call(rep_range, concentrations, dose_range, data, protein, ...)
    Runner->>Ladder: generate concentration subsets
    Ladder-->>Runner: concentration ladders
    Runner->>OneSim: iterate combos (N_rep, NumConcs)
    OneSim->>Simulator: simulate experiment (data, params)
    Simulator-->>OneSim: Hit_Rates_Data
    OneSim-->>Runner: filtered TPR rows (Interaction, TPR, N_rep, NumConcs)
    Runner-->>User: combined simulation results (data.frame)

    User->>Plotter: call(simulation_results)
    Plotter->>GG: build ggplot panel
    GG-->>Plotter: ggplot object
    Plotter->>Plotly: ggplotly + layout
    Plotly-->>Plotter: interactive plot
    Plotter-->>User: plotly visualization
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 Hopping doses in tidy rows,

I pick ladders where the log-median goes,
I seed simulations, count TPR light,
I draw curves that glow from faint to bright,
A little rabbit cheers the plots tonight.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Merge TPR simulation function into Development' accurately reflects the main change: migrating and exporting a TPR simulation function from MSstatsShiny into this repository, making it publicly available.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/tpr-function

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
R/TPR_Power_Curve.R (1)

34-46: Consider simplifying by removing the intermediate sim_args list.

The sim_args list is created and then immediately unpacked in the function call. This adds verbosity without benefit.

♻️ Simplified version
   run_one <- function(n_rep, k_conc, seed = 123) {
     set.seed(seed + n_rep * 100 + k_conc)
     concs_k <- CONC_MAP[[as.character(k_conc)]]
 
-    sim_args <- list(
-      N_proteins = n_proteins,
-      N_rep = n_rep,
-      Concentrations = concs_k,
-      IC50_Prediction = FALSE
-    )
-
-    temp_res <- futureExperimentSimulation(
-      N_proteins = sim_args$N_proteins,
-      N_rep = sim_args$N_rep,
-      Concentrations = sim_args$Concentrations,
-      IC50_Prediction = sim_args$IC50_Prediction
-    )
+    temp_res <- futureExperimentSimulation(
+      N_proteins = n_proteins,
+      N_rep = n_rep,
+      Concentrations = concs_k,
+      IC50_Prediction = FALSE
+    )
     temp_res$Hit_Rates_Data |>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 34 - 46, The sim_args list is unnecessary
boilerplate: remove the sim_args variable and call futureExperimentSimulation
directly with the original variables (n_proteins, n_rep, concs_k, and FALSE) by
passing them to the corresponding parameters N_proteins, N_rep, Concentrations,
and IC50_Prediction in the futureExperimentSimulation call so you only use the
function name futureExperimentSimulation and the original symbols n_proteins,
n_rep, concs_k, and FALSE.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/TPR_Power_Curve.R`:
- Around line 91-92: The linetype mapping currently uses a fixed vector ltypes
and assigns ltype_values <- setNames(ltypes[seq_along(rep_levels)],
as.character(rep_levels)), which will produce NAs when length(rep_levels) >
length(ltypes); change the assignment to repeat or recycle ltypes to cover all
replicate levels (e.g., generate a vector of length length(rep_levels) by
repeating ltypes with length.out = length(rep_levels) or using rep(ltypes,
length.out = length(rep_levels))) and then call setNames(...) so ltype_values
maps every element of rep_levels to a valid linetype; refer to the variables
ltypes, ltype_values and rep_levels when making the change.
- Around line 26-28: The function run_tpr_simulation currently assumes rep_range
is a two-element integer vector; add input validation at the top of
run_tpr_simulation to check that rep_range is a length-2 numeric (or integer)
vector with no NA/NaN/Inf values, both elements are whole numbers (or can be
safely coerced to integers), and rep_range[1] <= rep_range[2]; if any check
fails, stop with a clear error message mentioning rep_range and expected form
(e.g., "rep_range must be a length-2 integer vector with min <= max"). After
validation, coerce to integers (if needed) before creating rep_grid to avoid
downstream surprises when using rep_grid.

---

Nitpick comments:
In `@R/TPR_Power_Curve.R`:
- Around line 34-46: The sim_args list is unnecessary boilerplate: remove the
sim_args variable and call futureExperimentSimulation directly with the original
variables (n_proteins, n_rep, concs_k, and FALSE) by passing them to the
corresponding parameters N_proteins, N_rep, Concentrations, and IC50_Prediction
in the futureExperimentSimulation call so you only use the function name
futureExperimentSimulation and the original symbols n_proteins, n_rep, concs_k,
and FALSE.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b0cfe25-885d-428f-890c-886741a75f85

📥 Commits

Reviewing files that changed from the base of the PR and between 8739ce4 and 5a26bdc.

📒 Files selected for processing (10)
  • DESCRIPTION
  • NAMESPACE
  • R/TPR_Power_Curve.R
  • man/ConvertGroupToNumericDose.Rd
  • man/DoseResponseFit.Rd
  • man/FutureExperimentSimulation.Rd
  • man/PredictIC50Parallel.Rd
  • man/VisualizeResponseProtein.Rd
  • man/plot_tpr_power_curve.Rd
  • man/run_tpr_simulation.Rd
💤 Files with no reviewable changes (5)
  • man/ConvertGroupToNumericDose.Rd
  • man/VisualizeResponseProtein.Rd
  • man/PredictIC50Parallel.Rd
  • man/DoseResponseFit.Rd
  • man/FutureExperimentSimulation.Rd

IC50_Prediction = FALSE
)

temp_res <- futureExperimentSimulation(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused. Users should also pass in the actual dataset and the Protein ID that is considered a strong interaction. But I don't see where this is being passed to the function.

"8" = c(0, 1, 10, 30, 100, 300, 1000, 3000),
"9" = c(0, 1, 3, 10, 30, 100, 300, 1000, 3000)
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with Sarah, this should not be hard coded but rather each subsequent value should be picked based on farthest distance from the log(median) among a user's set of doses, not this hard-coded list.

A user will have any arbitrary set of doses as well (i.e. it's not just 0, 1, 3, 10. You might have 0, 2, 4, 20, ...)

This also means that another input parameter should be the concentrations themselves (e.g. 0,1,3,10,30,100,300,1000,3000) and the dose_range (e.g. c(2,9) being doses 2 through 9).

stop("rep_range must be a numeric vector of length 2 with c(min, max) where min <= max.")
}
if (rep_range[2] > 5) {
stop("Maximum replicates is 5 (limited by available line styles for plotting).")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(limited by available line styles for plotting) - if a user sees this, it's not going to make sense. You can remove this part.

I don't think there needs to be a maximum replicate range in this function. It should go inside the plotting function since that's where the plotting limitation

Comment on lines +14 to +17
#' Run TPR simulation across a grid of concentration counts and replicate counts
#'
#' Sweeps over combinations of dose counts (2-9) and replicate counts,
#' calling \code{futureExperimentSimulation()} for each combination.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People don't know what TPR simulation means or what it means to sweep over dose counts / replicate counts calling this futureExperimentSimulation function.

Can you be more thorough in a way so that a biologist with minimal coding experience would understand what this function does?

#'
#' @importFrom dplyr filter mutate select if_else
#' @export
run_tpr_simulation <- function(rep_range, n_proteins = 1000) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be unit tests for this function

if (!is.numeric(rep_range) || length(rep_range) != 2 || rep_range[1] > rep_range[2]) {
stop("rep_range must be a numeric vector of length 2 with c(min, max) where min <= max.")
}
if (rep_range[2] > 5) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this validation be for the difference between max and min? Not the max replicate value.

Comment on lines +130 to +131
if (!requireNamespace("plotly", quietly = TRUE)) {
stop("Package 'plotly' is required for interactive plots. Please install it.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this because plotly should already be a dependency of this package, meaning users will have it automatically installed when they install MSstatsResponse

}
ltype_values <- setNames(ltypes[seq_along(rep_levels)], as.character(rep_levels))

make_panel <- function(data, color, show_legend = FALSE) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this outside of the plotting function instead of inside it and rename it so that it's clear what it's doing, aka plotting the TPR curves using ggplot2/plotly.

@@ -1,93 +0,0 @@
% Generated by roxygen2: do not edit by hand
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the documentation files in the man folder should NOT be deleted. Can you fix it so that devtools::document() does not remove these files?

k_grid <- sort(unique(simulation_results$NumConcs))
rep_levels <- sort(unique(simulation_results$N_rep))

ltypes <- c("dotted", "dotdash", "dashed", "longdash", "solid")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought - I thought we discussed doing a color gradient instead? In that case, 5 replicate validation is not needed either.

- Replace with actual dataset and protein ID
- Replace hard-coded concentrations with a mechanism to pick based on farthest distance from the log(median)
- Replace the linestyles with colot palettes for plot representation and removed maximum replicate rage
- Provide examples for TPR simulation usage
- Re-write the comments in a biologist friendly tone
- Extract the single TPR simulation logic into a helper function
- Improve the plotly import method
- Write unit test cases for the plotly function
- Move plotly from suggested into required dependency and remove the installation check from code
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
R/ExperimentalDesignSimulation.R (1)

249-254: Potential division by zero or NaN when all profile doses are zero.

If the user's data contains only control measurements (all dose_numeric == 0), the condition at line 249 (any(profile$dose_numeric != 0)) will be FALSE, leaving LogIntensities as NA for non-zero target concentrations. The subsequent fill loop (lines 258-305) handles this, but it's worth noting that this relies on the fallback behavior.

This is acceptable given the fallback logic, but consider adding a warning when no non-zero doses exist in the profile data.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/ExperimentalDesignSimulation.R` around lines 249 - 254, The code currently
skips filling LogIntensities when a profile has no non-zero doses (the else
branch when any(profile$dose_numeric != 0) is FALSE), which can leave NA values
relying on later fallback; add an explicit check for the all-zero-doses case
(using profile$dose_numeric) and emit a clear warning (e.g., via warning()) that
this profile contains only control measurements so non-zero target
concentrations cannot be matched, referencing the profile identifier (or row
index) and the object names used here (profile, non_zero,
complete_profile$LogIntensities) so the user can locate the problematic profile;
do not change the existing fallback logic, only add the warning in the branch
where no non-zero doses exist.
tests/testthat/test-TPR_Power_Curve.R (1)

92-105: Consider adding a test for the data provided without protein case.

This test verifies that data=NULL, protein=NULL works, but doesn't test the case where data is provided without protein. Based on futureExperimentSimulation's validation, this should fail with "must specify at least one" error.

🧪 Proposed additional test
test_that("run_tpr_simulation errors when data provided without protein", {
  mock_data <- data.frame(
    protein = "P1",
    drug = c("DMSO", "Drug1"),
    dose = c(0, 1e-6),
    response = c(20, 18)
  )
  
  expect_error(
    run_tpr_simulation(
      rep_range = c(1, 1),
      concentrations = c(0, 100, 1000),
      dose_range = c(2, 3),
      n_proteins = 10,
      data = mock_data,
      protein = NULL
    ),
    "must specify"
  )
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/testthat/test-TPR_Power_Curve.R` around lines 92 - 105, Add a test that
calls run_tpr_simulation with a non-NULL data.frame but protein = NULL and
asserts it errors with the "must specify" message; create a small mock_data
(columns: protein, drug, dose, response) and wrap the call in expect_error(...,
"must specify") to mirror futureExperimentSimulation's validation and ensure
run_tpr_simulation surfaces the expected validation error.
R/TPR_Power_Curve.R (2)

76-78: Seed collision possible for different parameter combinations.

The seed formula seed + n_rep * 100 + length(concs) can produce identical seeds for different (n_rep, concs) pairs. For example, (n_rep=1, length=102) and (n_rep=2, length=2) both produce seed + 202.

Consider using a hash-based approach or including both values in a way that prevents collision:

♻️ Proposed fix for unique seeding
 .run_one_tpr_simulation <- function(n_rep, concs, n_proteins, data = NULL,
                                      protein = NULL, seed = 123) {
-  set.seed(seed + n_rep * 100 + length(concs))
+  # Use prime multiplier to reduce collision likelihood
+  set.seed(seed + n_rep * 1009 + length(concs) * 17)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 76 - 78, The current seed formula in
.run_one_tpr_simulation (seed + n_rep * 100 + length(concs)) can collide for
different (n_rep, concs) pairs; replace it with a collision-resistant seed
derived from hashing the combined parameters: compute a new_seed by hashing a
string built from seed, n_rep and the full concs vector (e.g., paste(seed,
n_rep, paste(concs, collapse=","))) using a stable hash (digest::digest with
xxhash32 or sha1), convert a portion of the hex hash to an integer
(strtoi(substr(...), 16L)) and use set.seed(new_seed); update
NAMESPACE/DESCRIPTION or imports to include digest if not already present.

284-299: Plot only shows "Strong" interaction results.

The function filters to Interaction == "Strong" at line 288, discarding "Weak" interaction data. The documentation at lines 260-261 mentions "user-selected protein template" but doesn't clarify that only strong interactions are visualized.

If this is intentional, consider updating the documentation. If users might want to see both interaction types, consider adding a parameter to control this.

📝 Documentation clarification
 #' Creates an interactive plot showing how the true positive rate
 #' (detection power) changes with the number of doses and replicates
-#' for the user-selected protein template. Line color shading goes
+#' for strong interactions only. Line color shading goes
 #' from light (fewest replicates) to dark (most replicates).
+#'
+#' Note: This function currently visualizes only the "Strong" interaction
+#' category. Weak interaction results from \code{run_tpr_simulation} are
+#' not displayed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 284 - 299, The function
plot_tpr_power_curve currently hard-filters simulation_results to Interaction ==
"Strong" (results_protein), which drops "Weak" data; either make this behavior
configurable by adding an argument (e.g., interaction_type or
include_interactions) to plot_tpr_power_curve and use that parameter when
subsetting simulation_results, or if the plot must remain Strong-only, update
the function documentation and comment to state explicitly that only "Strong"
interactions are plotted; locate the filtering logic around results_protein and
the function signature plot_tpr_power_curve to implement the parameter or adjust
the docs accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/ExperimentalDesignSimulation.R`:
- Around line 336-349: The default_template filtering using exact matching
(default_template$weak_interaction[default_template$weak_interaction$dose %in%
concentrations, ] and similarly for default_template$no_interaction) can return
zero rows for non-standard concentrations; change the logic in the
weak_interaction and no_interaction branches to first attempt the exact filter
and if the result has zero rows fall back to calling .getMeanProfile(NULL,
drug_data, concentrations) (or otherwise map concentrations to nearest-template
doses) so that the returned profile always has length(concentrations),
referencing weak_interaction, no_interaction, default_template, .getMeanProfile
and concentrations to locate the code to modify.

In `@R/TPR_Power_Curve.R`:
- Around line 86-91: When building sim_args in TPR_Power_Curve.R, ensure we
validate the dependency between data and protein: if data is provided
(sim_args$data set) but protein is NULL, either require the caller to supply
protein or provide a clear error; add a check after setting sim_args$data that
stops with a descriptive message (referencing futureExperimentSimulation's
requirement) if protein is NULL, or alternatively set
sim_args$strong_proteins/weak_proteins/no_interaction_proteins appropriately;
update the validation around sim_args and the parameter named protein so
futureExperimentSimulation won't error unexpectedly.

---

Nitpick comments:
In `@R/ExperimentalDesignSimulation.R`:
- Around line 249-254: The code currently skips filling LogIntensities when a
profile has no non-zero doses (the else branch when any(profile$dose_numeric !=
0) is FALSE), which can leave NA values relying on later fallback; add an
explicit check for the all-zero-doses case (using profile$dose_numeric) and emit
a clear warning (e.g., via warning()) that this profile contains only control
measurements so non-zero target concentrations cannot be matched, referencing
the profile identifier (or row index) and the object names used here (profile,
non_zero, complete_profile$LogIntensities) so the user can locate the
problematic profile; do not change the existing fallback logic, only add the
warning in the branch where no non-zero doses exist.

In `@R/TPR_Power_Curve.R`:
- Around line 76-78: The current seed formula in .run_one_tpr_simulation (seed +
n_rep * 100 + length(concs)) can collide for different (n_rep, concs) pairs;
replace it with a collision-resistant seed derived from hashing the combined
parameters: compute a new_seed by hashing a string built from seed, n_rep and
the full concs vector (e.g., paste(seed, n_rep, paste(concs, collapse=",")))
using a stable hash (digest::digest with xxhash32 or sha1), convert a portion of
the hex hash to an integer (strtoi(substr(...), 16L)) and use
set.seed(new_seed); update NAMESPACE/DESCRIPTION or imports to include digest if
not already present.
- Around line 284-299: The function plot_tpr_power_curve currently hard-filters
simulation_results to Interaction == "Strong" (results_protein), which drops
"Weak" data; either make this behavior configurable by adding an argument (e.g.,
interaction_type or include_interactions) to plot_tpr_power_curve and use that
parameter when subsetting simulation_results, or if the plot must remain
Strong-only, update the function documentation and comment to state explicitly
that only "Strong" interactions are plotted; locate the filtering logic around
results_protein and the function signature plot_tpr_power_curve to implement the
parameter or adjust the docs accordingly.

In `@tests/testthat/test-TPR_Power_Curve.R`:
- Around line 92-105: Add a test that calls run_tpr_simulation with a non-NULL
data.frame but protein = NULL and asserts it errors with the "must specify"
message; create a small mock_data (columns: protein, drug, dose, response) and
wrap the call in expect_error(..., "must specify") to mirror
futureExperimentSimulation's validation and ensure run_tpr_simulation surfaces
the expected validation error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fb2bd13d-17ca-4496-9c82-a9a402fcfaf4

📥 Commits

Reviewing files that changed from the base of the PR and between 8b7eaf9 and b1ed032.

📒 Files selected for processing (9)
  • DESCRIPTION
  • NAMESPACE
  • R/ExperimentalDesignSimulation.R
  • R/TPR_Power_Curve.R
  • man/DoseResponseFit.Rd
  • man/plot_tpr_power_curve.Rd
  • man/run_tpr_simulation.Rd
  • tests/testthat/test-ExperimentalDesignSimulation.R
  • tests/testthat/test-TPR_Power_Curve.R
✅ Files skipped from review due to trivial changes (3)
  • man/DoseResponseFit.Rd
  • man/plot_tpr_power_curve.Rd
  • man/run_tpr_simulation.Rd
🚧 Files skipped from review as they are similar to previous changes (2)
  • DESCRIPTION
  • NAMESPACE

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/TPR_Power_Curve.R`:
- Around line 25-27: The code currently sets control <- min(concentrations),
which contradicts the docstring that control must be 0; change this to
explicitly require/control for 0: validate that 0 is present in the
concentrations vector at the start of the function (stop with an informative
error if not), then set control <- 0 (keep highest <- max(concentrations) and
compute candidates <- setdiff(concentrations, c(control, highest))). Use the
variable names concentrations, control, highest, and candidates to locate and
modify the logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e78deb24-c9be-4966-a4b6-7fbb124766bf

📥 Commits

Reviewing files that changed from the base of the PR and between b1ed032 and cfabb47.

📒 Files selected for processing (4)
  • R/ExperimentalDesignSimulation.R
  • R/TPR_Power_Curve.R
  • man/plot_tpr_power_curve.Rd
  • tests/testthat/test-TPR_Power_Curve.R
✅ Files skipped from review due to trivial changes (2)
  • man/plot_tpr_power_curve.Rd
  • tests/testthat/test-TPR_Power_Curve.R

Comment on lines +245 to +260
for (i in seq_len(nrow(complete_profile))) {
target = complete_profile$dose[i]
if (target == 0) {
# Control: match any zero-dose entry
zero_match = profile[profile$dose_numeric == 0, ]
if (nrow(zero_match) > 0) {
complete_profile$LogIntensities[i] = zero_match$LogIntensities[1]
}
} else if (any(profile$dose_numeric != 0)) {
# Non-zero: find closest on log scale
non_zero = profile[profile$dose_numeric != 0, ]
log_ratios = abs(log10(non_zero$dose_numeric) - log10(target))
best = which.min(log_ratios)
complete_profile$LogIntensities[i] = non_zero$LogIntensities[best]
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this loop is overly complex. Shouldn't the dose_str column of the profile variable match the string version of the complete_profile$dose column? This could be a simple left_join if you have a string version of the dose column in complete_profile.

complete_profile = complete_profile %>%
left_join(profile, by = "dose")
if (nrow(profile) > 0 && !any(profile$dose_numeric != 0)) {
warning("No non-zero dose measurements found for the selected protein. Template will use fallback values.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a stop instead of a warning. It doesn't make sense to be able to make simulations for sample size calculation if your dataset has only dose 0.

Comment on lines +329 to +336
default_template <- NULL
if (is.null(weak_proteins) || is.null(no_interaction_proteins)) {
template1_path <- system.file("extdata", "template1.RDS", package = "MSstatsResponse")
if (file.exists(template1_path)) {
default_template <- readRDS(template1_path)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a hard-coded template specific to the dataset you were given, but not extensible to other datasets. I think if a weak interaction / no interaction protein is not defined, we should fallback to giving the baseline profile.

Comment on lines +340 to +352
weak_interaction = if (!is.null(weak_proteins)) {
.getMeanProfile(weak_proteins, drug_data, concentrations)
} else if (!is.null(default_template)) {
default_template$weak_interaction[default_template$weak_interaction$dose %in% concentrations, ]
} else {
.getMeanProfile(NULL, drug_data, concentrations)
},
no_interaction = if (!is.null(no_interaction_proteins)) {
.getMeanProfile(no_interaction_proteins, drug_data, concentrations)
} else if (!is.null(default_template)) {
default_template$no_interaction[default_template$no_interaction$dose %in% concentrations, ]
} else {
.getMeanProfile(NULL, drug_data, concentrations)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will anything break in the package if weak_interaction / no_interaction is set to the baseline profile (given we never provided any weak interaction / no interaction protein)?


# Extract templates: use user data for specified proteins, defaults for others
template = list(
strong_interaction = .getMeanProfile(strong_proteins, drug_data, concentrations),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused, what does .getMeanProfile return when you defined a non-null strong_proteins? Reading the function, it doesn't return anything.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When protein_ids is non-null, .getMeanProfile returns a data.frame with columns dose, Intensity, LogIntensities, ratio - the function filters drug_data for those proteins, groups by dose, computes mean LogIntensities, then fills missing doses via interpolation. The return is at the end of the function.

result <- selection_order[as.character(k_range)]
result <- result[!sapply(result, is.null)]

if (length(result) == 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances does this occur? Because it would be helpful to have validation earlier on in the function that this ladder can't be constructed, this error message makes it unclear what the user needs to do.

Comment on lines +73 to +74
#' @param data Optional prepared dose-response data for user-based templates.
#' @param protein Optional protein ID for strong interaction template.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these parameters should be required.

Previously, it appeared like the could be optional because we had the template.rds file in the package, but that file is specific to a dataset that Sarah had access to. I think privacy issues made it so that Sarah couldn't put the full dataset and run everything E2E in the package.

#' @noRd
.run_one_tpr_simulation <- function(n_rep, concs, n_proteins, data = NULL,
protein = NULL, seed = 123) {
set.seed(seed + n_rep * 100 + length(concs))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just do set.seed(seed). Users can always adjust their seed number based on number of replicates / length of concentration


temp_res <- do.call(futureExperimentSimulation, sim_args)
temp_res$Hit_Rates_Data |>
dplyr::filter(Category %in% c("TPR (Strong)", "TPR (Weak)")) |>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safe to just filter only "TPR (Strong)" here.

Comment on lines +132 to +136
#' @param data Optional. User's prepared dose-response data (e.g., from
#' \code{MSstatsPrepareDoseResponseFit}). If provided, protein-specific
#' templates are extracted from this data instead of using defaults.
#' @param protein Optional. Character string specifying a protein ID to use as
#' the strong interaction template. Only used when \code{data} is provided.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in another comment, should be required

- Simplify the profile matching loop
- Use Stop instead of warning if there are no non-zero dose
- Remove template1.RDS fallback, use baseline
- Enhance the message for not generating any valid concentration subsets
- Make data and protein required and remove superfluous validations
- Update corresponding docs for turning the optional fields mandatory
- Simplify seed in one tpr simulation
- Filter only TPR strong
 - Update the test cases
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
R/TPR_Power_Curve.R (2)

288-304: Consider adding a guard for empty results.

If simulation_results contains no rows where Interaction == "Strong" (e.g., due to upstream data issues), results_protein will be empty and the plot will render without data or error message. This edge case is unlikely given the current pipeline but could be confusing if encountered.

🛡️ Optional: Add validation for empty data
 plot_tpr_power_curve <- function(simulation_results) {
   k_grid <- sort(unique(simulation_results$NumConcs))

   # Use only the Strong interaction results (user-selected protein template)
   results_protein <- simulation_results[simulation_results$Interaction == "Strong", ]
+
+  if (nrow(results_protein) == 0) {
+    stop("No 'Strong' interaction results found in simulation_results.")
+  }

   p <- .make_tpr_panel(results_protein, "#a6dba0", "#1b7837", k_grid, TRUE)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 288 - 304, Add a guard in
plot_tpr_power_curve to handle the case where results_protein
(simulation_results filtered by Interaction == "Strong") is empty: detect if
results_protein has zero rows and either abort with a clear message (e.g., stop
or warning) or return a simple empty/placeholder plotly object with an
explanatory annotation, instead of calling .make_tpr_panel with empty data;
reference results_protein, plot_tpr_power_curve and .make_tpr_panel to locate
where to insert the check and subsequent early return.

97-104: Simplify redundant if_else to direct assignment.

Since line 98 filters to only Category == "TPR (Strong)", the if_else at line 102 will always evaluate to "Strong". The "Weak" branch is dead code.

♻️ Proposed simplification
   temp_res$Hit_Rates_Data |>
     dplyr::filter(Category == "TPR (Strong)") |>
     dplyr::mutate(
       N_rep = n_rep,
       NumConcs = length(concs),
-      Interaction = dplyr::if_else(Category == "TPR (Strong)", "Strong", "Weak")
+      Interaction = "Strong"
     ) |>
     dplyr::select(Interaction, TPR = Percent, N_rep, NumConcs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 97 - 104, The mutate call on
temp_res$Hit_Rates_Data sets Interaction using dplyr::if_else but the preceding
dplyr::filter(Category == "TPR (Strong)") ensures Category is always "TPR
(Strong)", so replace the conditional with a direct assignment: in the
dplyr::mutate that defines Interaction (within the pipeline starting at
temp_res$Hit_Rates_Data |> ...), set Interaction = "Strong" instead of using
dplyr::if_else to remove the dead "Weak" branch and simplify the expression.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/TPR_Power_Curve.R`:
- Around line 148-156: The example fails because run_tpr_simulation requires
arguments data and protein with no defaults; update the function signature for
run_tpr_simulation to set data = NULL and protein = NULL (or alternatively
update the example to supply minimal mock data/protein or wrap the example in
\dontrun{}), and ensure any internal code in run_tpr_simulation (e.g., checks or
operations that assume non-NULL data/protein) handles NULL appropriately so the
examples run without error.

---

Nitpick comments:
In `@R/TPR_Power_Curve.R`:
- Around line 288-304: Add a guard in plot_tpr_power_curve to handle the case
where results_protein (simulation_results filtered by Interaction == "Strong")
is empty: detect if results_protein has zero rows and either abort with a clear
message (e.g., stop or warning) or return a simple empty/placeholder plotly
object with an explanatory annotation, instead of calling .make_tpr_panel with
empty data; reference results_protein, plot_tpr_power_curve and .make_tpr_panel
to locate where to insert the check and subsequent early return.
- Around line 97-104: The mutate call on temp_res$Hit_Rates_Data sets
Interaction using dplyr::if_else but the preceding dplyr::filter(Category ==
"TPR (Strong)") ensures Category is always "TPR (Strong)", so replace the
conditional with a direct assignment: in the dplyr::mutate that defines
Interaction (within the pipeline starting at temp_res$Hit_Rates_Data |> ...),
set Interaction = "Strong" instead of using dplyr::if_else to remove the dead
"Weak" branch and simplify the expression.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bc5f754e-c855-4899-8011-be5b09124d28

📥 Commits

Reviewing files that changed from the base of the PR and between cfabb47 and bf82171.

📒 Files selected for processing (5)
  • R/ExperimentalDesignSimulation.R
  • R/TPR_Power_Curve.R
  • man/run_tpr_simulation.Rd
  • tests/testthat/test-ExperimentalDesignSimulation.R
  • tests/testthat/test-TPR_Power_Curve.R
✅ Files skipped from review due to trivial changes (3)
  • R/ExperimentalDesignSimulation.R
  • man/run_tpr_simulation.Rd
  • tests/testthat/test-TPR_Power_Curve.R
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/testthat/test-ExperimentalDesignSimulation.R

Comment on lines +148 to +156
#' @examples
#' # Quick example with small simulation (for speed)
#' results <- run_tpr_simulation(
#' rep_range = c(1, 2),
#' concentrations = c(0, 10, 100, 1000),
#' dose_range = c(2, 4),
#' n_proteins = 50
#' )
#' head(results)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation example will fail: data and protein are required but not provided.

The function signature requires data and protein (no default values), but the "Quick example" omits them. Running this example will error:

Error in run_tpr_simulation(rep_range = c(1, 2), ...) : 
  argument "data" is missing, with no default

Either add data = NULL, protein = NULL defaults to the function signature, or update the example to include placeholder/mock data.

🛡️ Option 1: Make parameters optional with NULL defaults
-run_tpr_simulation <- function(rep_range, concentrations, dose_range,
-                                data, protein, n_proteins = 1000) {
+run_tpr_simulation <- function(rep_range, concentrations, dose_range,
+                                data = NULL, protein = NULL, n_proteins = 1000) {
📝 Option 2: Wrap the example in \dontrun{}
-#' # Quick example with small simulation (for speed)
-#' results <- run_tpr_simulation(
+#' \dontrun{
+#' # Example with user-provided data
+#' results <- run_tpr_simulation(
 #'   rep_range = c(1, 2),
 #'   concentrations = c(0, 10, 100, 1000),
 #'   dose_range = c(2, 4),
+#'   data = my_prepared_data,
+#'   protein = "P12345",
 #'   n_proteins = 50
 #' )
-#' head(results)
+#' }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#' @examples
#' # Quick example with small simulation (for speed)
#' results <- run_tpr_simulation(
#' rep_range = c(1, 2),
#' concentrations = c(0, 10, 100, 1000),
#' dose_range = c(2, 4),
#' n_proteins = 50
#' )
#' head(results)
#' `@examples`
#' \dontrun{
#' # Example with user-provided data
#' results <- run_tpr_simulation(
#' rep_range = c(1, 2),
#' concentrations = c(0, 10, 100, 1000),
#' dose_range = c(2, 4),
#' data = my_prepared_data,
#' protein = "P12345",
#' n_proteins = 50
#' )
#' }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 148 - 156, The example fails because
run_tpr_simulation requires arguments data and protein with no defaults; update
the function signature for run_tpr_simulation to set data = NULL and protein =
NULL (or alternatively update the example to supply minimal mock data/protein or
wrap the example in \dontrun{}), and ensure any internal code in
run_tpr_simulation (e.g., checks or operations that assume non-NULL
data/protein) handles NULL appropriately so the examples run without error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants