Skip to content

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Feb 8, 2026

Summary

  • Implement ImputationDiD estimator: the efficient imputation approach for staggered DiD from Borusyak, Jaravel & Spiess (2024)
  • Three-step estimation: OLS on untreated observations → impute counterfactuals → aggregate treatment effects
  • Conservative clustered variance (Theorem 3, Equation 7) with configurable auxiliary model partition
  • Event study and group aggregation, pre-trend test (Equation 9), Proposition 5 handling for no never-treated units
  • Optional multiplier bootstrap, covariate support via within-transformation
  • 54 tests, API docs, tutorial notebook, R benchmark scripts, full documentation updates

Methodology references (required if estimator / math changes)

Validation

  • Tests added/updated: tests/test_imputation.py (54 tests covering basic fit, aggregation modes, variance estimation, pre-trend test, bootstrap, cross-validation vs CS/SA, edge cases)
  • Full test suite: 1150 passed, 30 skipped, 0 failures
  • Backtest / simulation / notebook evidence: docs/tutorials/11_imputation_did.ipynb, R benchmark scripts at benchmarks/R/benchmark_didimputation.R

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Implement ImputationDiD, the efficient imputation estimator for staggered
DiD from Borusyak, Jaravel & Spiess (2024) "Revisiting Event-Study Designs:
Robust and Efficient Estimation", Review of Economic Studies.

Core features:
- Three-step estimation: OLS on untreated → impute counterfactuals → aggregate
- Conservative clustered variance (Theorem 3) with configurable auxiliary
  model partition (cohort_horizon, cohort, horizon)
- Event study and group aggregation with balance_e and horizon_max options
- Pre-trend test (Equation 9) independent of treatment effect estimation
- Proposition 5: NaN for unidentified long-run effects without never-treated
- Optional multiplier bootstrap for consistency with other staggered estimators
- Covariate support via within-transformation

Includes 54 tests, API docs, tutorial notebook, R benchmark scripts,
methodology registry entry, and README/ROADMAP/CLAUDE.md updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Feb 8, 2026

Overall Assessment: ⛔ Blocker

Executive Summary

  • ImputationDiD’s conservative variance deviates from the Methodology Registry: τ̃_g is computed as an unweighted mean instead of v_it‑weighted, which can misstate SEs for event‑study/group targets. diff_diff/imputation.py:L1239-L1294
  • Step 1 is not OLS for unbalanced panels (uses simple demeaning/means), conflicting with the paper/Registry requirement and the “unbalanced panels supported” claim. diff_diff/imputation.py:L922-L993
  • Rank‑condition coverage for untreated periods isn’t enforced; missing FE are handled inconsistently between imputation and variance, risking silent NaNs and wrong SEs. diff_diff/imputation.py:L625-L689, diff_diff/imputation.py:L1025-L1039, diff_diff/imputation.py:L1260-L1267
  • Bootstrap inference ignores the user’s cluster parameter and always resamples by unit. diff_diff/imputation.py:L1773-L1812
  • Event‑study plotting may misplace the reference period for ImputationDiD with anticipation>0. diff_diff/visualization.py:L405-L431

Methodology

  • Severity: P0 | Impact: ImputationDiD variance uses unweighted τ̃_g despite Registry specifying v_it‑weighted averages; this breaks Theorem 3’s auxiliary residuals and can bias SEs for event‑study/group aggregations. | Fix: In _compute_auxiliary_residuals_treated, compute τ̃_g as sum(v_it * τ̂_it) / sum(v_it) within each partition group (guard zero weights), and remove the “simple average” comment. diff_diff/imputation.py:L1239-L1294
  • Severity: P0 | Impact: Step 1 is not OLS on untreated observations for unbalanced panels (uses unit/time means and double‑demeaning), so imputation can be biased even though the estimator claims unbalanced support. | Fix: Replace mean‑based FE recovery with an actual OLS/absorbed‑FE solution on Ω0 (e.g., solve with unit/time dummies or a proper two‑way FE projection for unbalanced panels) for both covariate and no‑covariate paths. diff_diff/imputation.py:L922-L993
  • Severity: P1 | Impact: Rank‑condition checks are incomplete (no check for post‑treatment periods lacking untreated units); missing FE yield NaNs in τ̂ but variance uses fillna(0), producing inconsistent inference. | Fix: Add explicit unit/time coverage checks and honor rank_deficient_action (warn/error/silent); align missing‑FE handling between _impute_treatment_effects and _compute_auxiliary_residuals_treated (either drop affected obs and renormalize weights or set all inference to NaN). diff_diff/imputation.py:L625-L689, diff_diff/imputation.py:L1025-L1039, diff_diff/imputation.py:L1260-L1267
  • Severity: P1 | Impact: Bootstrap inference ignores cluster when it differs from unit, so bootstrap SEs/p‑values are wrong for clustered inference. | Fix: Build bootstrap weights at cluster_var granularity (not unit) and map to observations accordingly. diff_diff/imputation.py:L1773-L1812

Code Quality

  • Severity: P2 | Impact: plot_event_study reference‑period inference only checks n_groups==0; ImputationDiD uses n_obs==0, so reference is wrong when anticipation>0. | Fix: Treat n_obs==0 as a reference marker for ImputationDiD results in _extract_plot_data. diff_diff/visualization.py:L405-L431

Performance

  • Severity: P3 | Impact: _compute_v_untreated_with_covariates builds dense dummy matrices, which is O(N·(U+T)) memory/time and may not scale to large panels. | Fix: Use sparse matrices or a projection/within‑transform approach that avoids explicit dummies. diff_diff/imputation.py:L1181-L1236

Maintainability

  • No material issues found in the diff.

Tech Debt

  • No material issues found in the diff.

Security

  • No material issues found in the diff.

Documentation/Tests

  • Severity: P2 | Impact: Tests don’t cover rank‑condition failures for missing untreated time periods, cluster!=unit bootstrap behavior, or plotting reference detection with anticipation>0; unbalanced‑panel coverage only asserts the code runs, not correctness. | Fix: Add targeted tests for those scenarios (e.g., a panel with a treated period lacking untreated obs; bootstrap with cluster set to a higher level; plot reference inference for anticipation=1). tests/test_imputation.py:L444-L461, tests/test_imputation.py:L618-L710

…ring

- Fix 1 (P0): Replace one-pass demeaning with iterative Gauss-Seidel for
  exact FE on unbalanced panels (add _iterative_fe helper)
- Fix 2 (P0): Use v_it-weighted tau_tilde_g in auxiliary residuals per Eq 8
- Fix 3 (P1): Add rank-condition checks after Step 1, fix fillna(0.0)->NaN
  consistency in auxiliary residuals, handle NaN in v*eps product
- Fix 4 (P1): Bootstrap generates weights per cluster_var (not always unit),
  add cluster column validation
- Fix 5 (P2): plot_event_study detects reference via n_obs==0 (ImputationDiD)
- Fix 6 (P3): Use scipy.sparse for FE dummy matrices in covariate v_it
- Fix 7: Add 6 new tests, update REGISTRY.md edge cases and docstrings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 9, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 3e93375db98e044aad07c7f1a4ddb6aaef130792


Overall assessment: ⚠️ Needs changes

Executive summary:

  • ImputationDiD’s core 3‑step estimator and Thm 3 variance structure appear aligned with the Methodology Registry, but there are three methodology/inference mismatches that need fixes.
  • The pre‑trend test uses one‑pass demeaning, which is not exact for unbalanced panels (registry requires iterative FE).
  • Overall ATT SEs are computed with weights over all treated observations even when tau_hat is NaN for some treated obs, so the variance targets a different estimand than the reported ATT.
  • balance_e is applied to event‑study aggregation but ignored in bootstrap event‑study inference, so bootstrap SEs/p‑values are computed on a different sample.

Methodology

  • P1 — Pre‑trend test FE demeaning is not exact for unbalanced panels. Impact: F‑stat/p‑value can be biased when the panel is unbalanced, conflicting with the registry’s requirement to use iterative FE (Gauss‑Seidel) for exact two‑way FE. Concrete fix: reuse the iterative demeaning logic from _fit_untreated_model (or run solve_ols with explicit unit/time dummies) for both y and X before the Wald test. (diff_diff/imputation.py:L1786-L1822)
  • P1 — Overall ATT variance uses uniform weights over all treated obs while the ATT averages only finite tau_hat. Impact: when rank conditions fail (missing FE) or Prop. 5 forces NaNs, the SE targets a different estimand and can be understated. Concrete fix: zero out weights for non‑finite tau_hat and renormalize to sum to 1, or return NaN for overall ATT/SE when any treated obs is unimputable. (diff_diff/imputation.py:L748-L773)
  • P1 — balance_e affects event‑study aggregation but not bootstrap inference. Impact: bootstrap SEs/p‑values are computed on all treated obs at horizon h, not the balanced cohort subset, so inference targets a different estimand. Concrete fix: persist the balanced cohort mask (or recompute it) inside _run_bootstrap and apply it to h_mask when forming bootstrap event‑study effects; add a regression test. (diff_diff/imputation.py:L1472-L1499, diff_diff/imputation.py:L1942-L1954)

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • P2 — Missing tests for key parameter interactions and edge‑case inference. Impact: regressions in bootstrap/event‑study inference or rank‑deficiency handling may slip through. Concrete fix: add tests in tests/test_imputation.py to cover balance_e + n_bootstrap > 0, and to assert overall ATT/SE behavior when some tau_hat are NaN due to rank condition violations.

…tstrap balance_e

- Extract _iterative_demean as static method; fix _pretrend_test to use
  iterative demeaning (exact for unbalanced panels) instead of one-pass
- Overall ATT SE: zero-weight NaN tau_hat observations so variance matches
  the finite-only ATT estimand; guard n_valid==0 returning NaN SE
- Extract _compute_balanced_cohort_mask helper; apply in bootstrap
  event-study loop so bootstrap SEs respect balance_e filtering
- Add 4 new tests covering all three fixes
- Update REGISTRY.md edge cases documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 9, 2026

/ai-review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 7495f0305a5fe5e0f46a4e5468f0d956170622c9


Overall assessment: ⚠️ Needs changes

Executive summary

  • ImputationDiD implementation tracks the Registry (Step 1/2/3, Theorem 3 variance, Prop 5 handling, NaN propagation) and is well covered by tests.
  • P1: Missing absorbing‑treatment/first_treat consistency validation, which is a required assumption in the Methodology Registry and can silently invalidate estimates.
  • No other material methodology, edge‑case, or security issues found in the diff.

Methodology

  • P1 — Missing absorbing‑treatment validation. Impact: The Registry requires absorbing treatment (once treated, always treated). fit uses row‑level first_treat directly to build D_it without checking it’s constant per unit; if first_treat varies within a unit (or encodes reversals), the estimator and SEs are invalid with no warning. Concrete fix: add a unit‑level check after type conversion to ensure first_treat is constant within unit (and optionally warn/error on violations), similar to the absorbing‑treatment warning in MultiPeriodDiD; add a regression test for non‑constant first_treat. diff_diff/imputation.py:622-676

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • No issues found; tests are extensive.

…putationDiD

Emit UserWarning when first_treat varies within a unit, which violates the
absorbing treatment assumption. The estimator proceeds but warns users that
results may be unreliable. Adds test and REGISTRY.md documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 9, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 3f361b64d54401ffa7b49609ce89ffcb6413ba82


Overall assessment: ⚠️ Needs changes

Executive summary

  • P1: Multiplier bootstrap uses uncentered tau_hat rather than influence‑function/centered contributions, so bootstrap SEs/p‑values are likely incorrect for overall, event‑study, and group effects.
  • P2: treatment_effects["weight"] doesn’t renormalize when some tau_hat are NaN, so downstream re‑aggregation can disagree with the estimator’s ATT.
  • P2: Covariate variance path forms a dense (A0'A0) matrix that can blow up memory for large panels; consider sparse/iterative solves or guardrails.
  • P2: If covariates are rank‑deficient and dropped in Step 1, the variance path still includes them, which can make SEs inconsistent with the fitted model.

Methodology

  • Severity: P1 | Impact: Bootstrap inference is not aligned with the estimator’s influence function; it weights raw tau_hat with Rademacher draws (uncentered), which ignores first‑stage estimation error and produces non‑zero variance even in constant‑effect/no‑noise settings. This can materially misstate uncertainty for all bootstrap outputs. | Fix: Implement multiplier bootstrap on the Theorem‑3 influence function: compute psi_it = v_it * epsilon_tilde_it, sum within clusters, multiply cluster sums by Rademacher weights, and use the resulting draws for SE/CI/p‑values; or re‑fit via cluster bootstrap. diff_diff/imputation.py:L2015-L2069
  • Severity: P2 | Impact: With rank‑deficient covariates, Step 1 drops/zeroes some coefficients, but the covariate variance path still uses the full covariate design, so v_untreated can be inconsistent with the actual fitted model, distorting SEs. | Fix: Track the kept covariate columns from solve_ols (or a boolean mask) and build A_0/A_1 using only those; alternatively, honor rank_deficient_action here and error/warn consistently. diff_diff/imputation.py:L1420-L1441

Code Quality

  • Severity: P2 | Impact: treatment_effects["weight"] is always 1/n_treated, even when some treated observations have tau_hat = NaN and the estimator renormalizes over finite values; users re‑aggregating from this table will get a different ATT. | Fix: Set weights to 0 for NaN tau_hat and renormalize to sum to 1 over finite observations, or explicitly document that this column is nominal. diff_diff/imputation.py:L867-L871

Performance

  • Severity: P2 | Impact: A0tA0 = (A_0.T @ A_0).toarray() scales as O((U+T+K)^2) and becomes prohibitive for large panels, risking OOM. | Fix: Solve the normal equations with sparse solvers (e.g., spsolve/iterative CG on A_0.T @ A_0 without densifying) or add guardrails/warnings when p is large. diff_diff/imputation.py:L1434-L1441

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • Severity: P3 | Impact: New tests cover bootstrap reproducibility but not correctness; the mis‑centering/influence‑function issue would not be detected. | Fix: Add a test where the analytical SE is known (or residuals are zero) and assert bootstrap SE ≈ 0 or matches the influence‑function variance. tests/test_imputation.py

…riance, weight/covariate fixes

- Fix bootstrap to use Theorem 3 influence function (psi_i = sum_t v_it * epsilon_tilde_it)
  instead of perturbing raw tau_hat, correctly accounting for first-stage estimation error
- Add sparse solver (spsolve) for variance normal equations to avoid O(p^2) memory blow-up,
  with dense lstsq fallback
- Exclude rank-deficient covariates from variance design matrices A_0/A_1 via kept_cov_mask
- Renormalize treatment_effects weights: 0 for NaN tau_hat, 1/n_valid for finite
- Add 6 new tests: zero-noise bootstrap SE, sparse solver, dense fallback, weight consistency,
  rank-deficient covariates, bootstrap precomputation failure warning
- Update REGISTRY.md edge cases documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 9, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: d62934225e7bc64f9747b94ae4ab974d40320721


Overall assessment: ⚠️ Needs changes

Executive summary

  • P1: balance_e does not enforce pre‑treatment balance as specified in the Methodology Registry/docs; it only checks horizons present in treated observations.
  • The core Imputation DiD steps (Omega₀ OLS → impute → Theorem 3 variance) otherwise look aligned with the registry and docstrings.
  • Maintainability: aggregation/bootstrap weight logic is duplicated and could drift.
  • Tests are extensive, but there is no assertion that balance_e actually enforces pre‑treatment balance.

Methodology

  • P1 | Impact: balance_e is documented to restrict cohorts to those observed at all relative times in [-balance_e, max_h], but the implementation only considers horizons among treated observations (typically h >= 0). This allows cohorts missing required pre‑treatment periods to pass the “balanced” filter, affecting event‑study estimates and SEs (and their bootstrap counterparts). | Concrete fix: Compute cohort horizons using all observations for treated units (including pre‑treatment periods), and build required_range explicitly over the full integer range [-balance_e, max_h] (or include negative horizons explicitly) before masking treated observations.
    Locations: diff_diff/imputation.py:L1105-L1138, diff_diff/imputation.py:L1664-L1678.

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • P3 | Impact: Weight construction for event‑study and group aggregations is duplicated between _aggregate_event_study/_aggregate_group and _precompute_bootstrap_psi, which risks analytical and bootstrap paths diverging if logic changes later. | Concrete fix: Extract a shared helper to compute weights/masks per horizon or cohort and reuse it in both aggregation and bootstrap precompute.
    Locations: diff_diff/imputation.py:L1640-L1802, diff_diff/imputation.py:L2095-L2158.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • P2 | Impact: Current balance_e test only checks horizon counts, so it would not catch missing pre‑treatment balance (the P1 issue). | Concrete fix: Add a test that removes a pre‑treatment period for one cohort and asserts balance_e drops that cohort (and that bootstrap uses the same mask).
    Location: tests/test_imputation.py:L275-L300.

…riods

_compute_balanced_cohort_mask had two bugs preventing pre-treatment
balance checking: (1) required_range filtered from post-treatment-only
all_horizons, missing negative values; (2) cohort horizons were built
from Omega_1 (post-treatment), so negative rel_times were never seen.

Fix: add _build_cohort_rel_times() to scan the full panel, and use
range(-balance_e, max_h+1) for the required set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 9, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: e3a69d394994145f4e637770fd65bf33a089ef26


Overall Assessment

  • ⚠️ Needs changes

Executive Summary

  • P1: Non-constant first_treat handling conflicts with the Methodology Registry and can violate the absorbing-treatment assumption; needs enforcement or doc correction.
  • P3: Bootstrap weight types are documented beyond what’s implemented (rademacher-only); clarify or add parameter.
  • P3: Weight construction is duplicated between aggregation and bootstrap paths, risking drift.
  • Tests not run in this review.

Methodology

  • Severity: P1 | Impact: The estimator only warns on non-constant first_treat but still uses per-row values to build _treated and _rel_time, which can create non‑absorbing treatment paths and invalidate identification; this contradicts the Registry’s statement that it proceeds using the unit-level first observed value. | Concrete fix: After the warning, coerce df[first_treat] to a per‑unit value (e.g., transform("first") or validated earliest positive) before computing _treated/_rel_time, or raise on inconsistency; then align the Registry text accordingly. diff_diff/imputation.py:L632-L671, docs/methodology/REGISTRY.md:L520-L531

Code Quality

  • No issues found in changed code paths.

Performance

  • Severity: P3 | Impact: Event‑study SEs recompute conservative variance per horizon, which rebuilds and solves the variance system each time; this can become expensive with many horizons/covariates. | Concrete fix: Cache A_0/A_1 and the A_0'A_0 factorization, or add a batch path for multiple horizon weights. diff_diff/imputation.py:L1772-L1800

Maintainability

  • Severity: P3 | Impact: Weight construction is duplicated between aggregation and bootstrap precompute, risking inconsistencies if one side changes. | Concrete fix: Extract shared weight builder helpers for overall/horizon/group and reuse in both call sites. diff_diff/imputation.py:L2122-L2164

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • Severity: P3 | Impact: Bootstrap results docstring advertises "mammen"/"webb" weights but implementation hardcodes rademacher, which can mislead users. | Concrete fix: Add a weight_type parameter wired to _generate_bootstrap_weights_batch, or update docstrings/docs to state rademacher-only. diff_diff/imputation.py:L46-L50, diff_diff/imputation.py:L2288-L2291
  • Severity: P3 | Impact: Test status unknown for this review. | Concrete fix: Run pytest tests/test_imputation.py (and ideally the full suite) before merge.

…cstring

- Coerce non-constant first_treat to per-unit value via transform("first")
  after warning, so downstream code (_never_treated, _treated, _rel_time)
  uses a consistent value matching the Registry documentation
- Fix ImputationBootstrapResults docstring: weight_type is rademacher-only,
  not "rademacher, mammen, webb"
- Add behavioral assertion to test verifying coercion actually applied

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 9, 2026

/ai-review

1 similar comment
@igerber
Copy link
Owner Author

igerber commented Feb 9, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 1db13c259f5ae91d9aa326e20ed4f2bd90701d15


Overall assessment: ⚠️ Needs changes

Executive summary

  • Method implemented (ImputationDiD) aligns with REGISTRY for Steps 1–3, Theorem 3 variance, Prop. 5, and pre-trend test; no P0/P1 methodology mismatches found.
  • P1: Event-study aggregation can silently return only the reference marker when filters remove all treated horizons; add explicit empty-result handling.
  • P2: Bootstrap inference uses normal-approx p-values/CIs rather than percentile/empirical bootstrap like other estimators; align or document.

Methodology

  • P1 | Impact: aggregate='event_study' with balance_e and/or horizon_max can filter out all treated observations and silently return only the reference period (n_obs=0), which looks like valid output but is an empty result set. | Fix: detect when no horizons have n_obs>0 after filtering, emit a warning or raise, and add a regression test. diff_diff/imputation.py:L1670-L1746
  • P2 | Impact: Bootstrap inference uses normal-approx p-values/CIs from bootstrap SEs, which diverges from the percentile/empirical bootstrap used elsewhere (CS/SA). This can understate tail risk and surprises users expecting distribution-based inference. | Fix: compute percentile CIs and bootstrap p-values from boot_* distributions, or explicitly document the normal-approx behavior in REGISTRY/docs. diff_diff/imputation.py:L2245-L2290

Code Quality

  • P3 | Impact: Unused local omega_1_indices is dead code and adds noise. | Fix: remove the unused variable. diff_diff/imputation.py:L1777-L1781

Performance

No issues found.

Maintainability

  • P3 | Impact: Weight-construction logic for event-study/group effects is duplicated in _aggregate_* and _precompute_bootstrap_psi, increasing drift risk. | Fix: factor weight creation into a shared helper and reuse in both paths. diff_diff/imputation.py:L1670-L1919, diff_diff/imputation.py:L2126-L2188

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

No issues found beyond the P1/P2 items above.

… warning, dead code

- Replace normal-approx bootstrap CIs/p-values with percentile/empirical
  method, matching CallawaySantAnna/SunAbraham convention. Shift
  perturbation-centered draws by original effect before computing CIs.
- Warn when balance_e/horizon_max filter out all treated horizons,
  leaving only the reference period marker in event study results.
- Remove unused omega_1_indices variable (ruff F841).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 9, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 137735828a21bfa49128f7d6637885211a88b9fa


Overall assessment: ✅ Looks good

Executive summary

  • ImputationDiD matches the Methodology Registry and cited Borusyak‑Jaravel‑Spiess (2024) requirements (Steps 1–3, Theorem 3 variance, Eq. 8 auxiliary residuals, Eq. 9 pre‑trend test, Proposition 5 handling); no P0/P1 mismatches found. (docs/methodology/REGISTRY.md:L451-L552, diff_diff/imputation.py:L1178-L2082)
  • Minor maintainability risk: weight construction is duplicated between aggregation and bootstrap precompute, which could drift.
  • Test gap: bootstrap + covariate path is untested.

Methodology

  • No issues found; implementation aligns with registry requirements and in‑code docstrings. (docs/methodology/REGISTRY.md:L451-L552, diff_diff/imputation.py:L1178-L2082)

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • P3 | Impact: Weighting logic for horizons/groups is duplicated, so future changes to aggregation weights can silently desync bootstrap SEs from analytic aggregation. | Concrete fix: factor weight construction into shared helpers and reuse in _aggregate_event_study/_aggregate_group and _precompute_bootstrap_psi. (diff_diff/imputation.py:L1670-L1851, diff_diff/imputation.py:L2185-L2220)

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • P3 | Impact: Bootstrap with covariates (sparse projection in v_untreated) is not exercised; regressions in the covariate+bootstrap path could ship unnoticed. | Concrete fix: add a test that fits with covariates and n_bootstrap>0, asserting finite bootstrap SEs and stable results. (tests/test_imputation.py:L881-L1030, diff_diff/imputation.py:L1383-L1462)

@igerber igerber merged commit 896e704 into main Feb 9, 2026
7 checks passed
@igerber igerber deleted the imputation-estimator branch February 9, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant