Skip to content

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Feb 8, 2026

Summary

  • Add pytest-xdist for parallel test execution (-n auto --dist worksteal) across all CI jobs
  • Cap min_n at 49 in pure Python mode to reduce bootstrap iterations from 199-249 to 49
  • Widen convergence tolerances conditionally when bootstrap iterations are reduced
  • Refactor TestTROPResults to use class-scoped shared fixture, eliminating 7 redundant TROP fits
  • Combine test_is_significant and test_significance_stars into single test
  • Reduce simulation counts in test_power.py (410 → 190 total simulations)
  • Reduce TROP methodology test data sizes (~60% fewer observations in LOOCV-heavy tests)
  • Remove -x fail-fast flag from CI (incompatible with xdist parallel workers)

Methodology references (required if estimator / math changes)

  • N/A - no methodology changes. Only test infrastructure and CI configuration changes.

Validation

  • Tests added/updated: test_ci_params.py (2 updated, 2 new), test_staggered.py, test_methodology_callaway.py, test_trop.py, test_power.py
  • Full test suite verified: 1035 passed, 30 skipped, 0 failures with xdist parallelization
  • All convergence tests pass with wider tolerances under reduced bootstrap iterations

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…ions

Add pytest-xdist for parallel test execution across all CI jobs, cap
bootstrap min_n at 49 in pure Python mode with wider convergence
tolerances, share TROP fixtures via class-scoped fixture to eliminate
7 redundant fits, reduce simulation counts and methodology test data
sizes. Full suite verified: 1035 passed, 0 failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Feb 8, 2026

Overall assessment: ✅ Looks good

Executive summary

  • No estimator/methodology changes detected; registry entries and estimator docstrings for CallawaySantAnna and TROP remain unchanged. (docs/methodology/REGISTRY.md:L259-L360, docs/methodology/REGISTRY.md:L546-L680, diff_diff/staggered.py:L149-L219, diff_diff/trop.py:L345-L424)
  • CI/test runtime reductions are confined to the test harness (bootstrap cap in pure-Python mode and reduced simulations). (tests/conftest.py:L107-L118, tests/test_power.py:L393-L533)
  • Minor documentation mismatch: CLAUDE guidance prescribes a 0.15 SE‑convergence tolerance for n_boot >= 100, but CallawaySantAnna SE convergence tests use 0.25/0.20. (CLAUDE.md:L381-L381, tests/test_methodology_callaway.py:L803-L842, tests/test_staggered.py:L1582-L1590, tests/test_staggered.py:L1730-L1741)

Methodology

  • No issues found. Changes are test/CI-only; estimator docstrings and registry requirements remain unaffected. (docs/methodology/REGISTRY.md:L259-L360, docs/methodology/REGISTRY.md:L546-L680, diff_diff/staggered.py:L149-L219, diff_diff/trop.py:L345-L424)

Code Quality

  • No issues found.

Performance

  • No issues found (production runtime unaffected).

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • Severity: P3 — Impact: Test-writing guidance now specifies threshold = 0.40 if n_boot < 100 else 0.15, but CallawaySantAnna SE convergence tests use looser 0.25/0.20 thresholds. This inconsistency can mislead contributors and cause future updates to drift. — Concrete fix: Update the guidance to reflect the per-test tolerances (or tighten the tests to 0.15/0.20 if that’s the intended standard). (CLAUDE.md:L381-L381, tests/test_methodology_callaway.py:L803-L842, tests/test_staggered.py:L1582-L1590, tests/test_staggered.py:L1730-L1741)

@igerber igerber merged commit 8e6cf40 into main Feb 8, 2026
8 checks passed
@igerber igerber deleted the shorten-trop-tests branch February 8, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant