Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
7a29d76
Add calibration package checkpointing, target config, and hyperparame…
baogorek Feb 17, 2026
f42e6aa
Ignore all calibration run outputs in storage/calibration/
baogorek Feb 17, 2026
29e53f9
Add --lambda-l0 to Modal runner, fix load_dataset dict handling
baogorek Feb 18, 2026
a898ebc
Add --package-path support to Modal runner
baogorek Feb 18, 2026
0a9340b
Add --log-freq for per-epoch calibration logging, fix output dir
baogorek Feb 18, 2026
fa7ebed
Create log directory before writing calibration log
baogorek Feb 18, 2026
13ec69c
Add debug logging for CLI args and command in package path
baogorek Feb 18, 2026
b628997
Fix chunked epoch display and rename Modal output files
baogorek Feb 18, 2026
06c465b
Replace per-clone Microsimulation with per-state precomputation
baogorek Feb 18, 2026
0a0f167
Add Modal Volume support and fix CUDA OOM fragmentation
baogorek Feb 19, 2026
13f3f30
Restrict targets to age demographics only for debugging
baogorek Feb 19, 2026
0b4acf7
Add include mode to target config, switch to age-only
baogorek Feb 20, 2026
32c851b
Switch target config to finest-grain include (~18K targets)
baogorek Feb 20, 2026
5a04c9f
Fix at-large district geoid mismatch (7 districts had 0 estimates)
baogorek Feb 20, 2026
09ae440
Add CLI package validator, drop impossible roth_ira_contributions target
baogorek Feb 20, 2026
5cb6d86
Add population-based initial weights for L0 calibration
baogorek Feb 20, 2026
ba97a90
Drop inflated dollar targets, add ACA PTC, save full package
baogorek Feb 20, 2026
49a1f66
Remove redundant --puf-dataset flag, add national targets
baogorek Feb 20, 2026
40ba0f2
fixing the stacked dataset builder
baogorek Feb 20, 2026
7c38d55
Derive cds_ordered from cd_geoid array instead of database query
baogorek Feb 20, 2026
abe1038
Update notebook outputs from successful calibration pipeline run
baogorek Feb 21, 2026
819a48c
Fix takeup draw ordering mismatch between matrix builder and stacked …
baogorek Feb 24, 2026
02f8ad0
checkpoint with aca_ptc randomness working
baogorek Feb 24, 2026
28b0d63
verify script
baogorek Feb 24, 2026
c1b8f62
Prevent clone-to-CD collisions in geography assignment
baogorek Feb 24, 2026
40fb389
checkpoint
baogorek Feb 25, 2026
cb57217
Fix cross-state cache pollution in matrix builder precomputation
baogorek Feb 25, 2026
b9ed175
bens work on feb 25
baogorek Feb 26, 2026
9e53f60
Selective county-level precomputation via COUNTY_DEPENDENT_VARS
juaristi22 Feb 26, 2026
105bb4a
minor fixes
juaristi22 Feb 26, 2026
23369f3
small optimizations
juaristi22 Feb 26, 2026
c86a263
Parallelize clone loop in build_matrix() via ProcessPoolExecutor
juaristi22 Feb 26, 2026
a69d1ee
Migrate from changelog_entry.yaml to towncrier fragments (#550)
MaxGhenis Feb 24, 2026
0157140
Update package version
MaxGhenis Feb 24, 2026
0c43746
Add end-to-end test for calibration database build pipeline (#556)
MaxGhenis Feb 26, 2026
0a67899
Update package version
MaxGhenis Feb 26, 2026
da5f1eb
Add ETL process for pregnancy calibration targets and update document…
daphnehanse11 Feb 26, 2026
9a30d7c
Add changelog fragment for pregnancy imputation (#563)
daphnehanse11 Feb 26, 2026
9ef9aac
Update package version
baogorek Feb 26, 2026
94bdb47
Migrate from changelog_entry.yaml to towncrier fragments (#550)
MaxGhenis Feb 24, 2026
f543c7f
Update package version
MaxGhenis Feb 24, 2026
3eb3eda
Add end-to-end test for calibration database build pipeline (#556)
MaxGhenis Feb 26, 2026
915fec8
Update package version
MaxGhenis Feb 26, 2026
157e6af
Parallelize clone loop in build_matrix() via ProcessPoolExecutor
juaristi22 Feb 26, 2026
7937331
add target config
baogorek Feb 27, 2026
1b720db
Reorganize calibration modules from local_area_calibration to calibra…
baogorek Feb 27, 2026
519c3c9
Fix modal run command to specify ::main entrypoint
baogorek Feb 27, 2026
422ba05
Fix worker stdout pollution breaking JSON result parsing
baogorek Feb 27, 2026
8e402c7
Add volume-based verification after worker builds
baogorek Feb 27, 2026
a6864b8
Fix at-large district GEOID round-trip conversion
baogorek Feb 27, 2026
d709386
Always fresh-download calibration inputs, clear stale builds
baogorek Feb 27, 2026
45aebc8
Normalize at-large district naming: 00 and 98 both map to 01
baogorek Feb 27, 2026
e3943d2
Enable takeup re-randomization in stacked dataset H5 builds
baogorek Feb 27, 2026
9f7f210
Streamline calibration pipeline: rename, upload, auto-trigger
baogorek Feb 27, 2026
a7a98aa
Add make pipeline: data → upload → calibrate → stage in one command
baogorek Feb 28, 2026
ecc6b0c
documentation
baogorek Feb 28, 2026
fddd03e
flag
baogorek Feb 28, 2026
db880f5
changes to remote calibration runner
baogorek Mar 1, 2026
22ebffd
Script cleanup, validation gating workflow, sanity checks, docs
baogorek Mar 2, 2026
b11a97a
Use source-imputed dataset for H5 staging, upload it from calibration
baogorek Mar 2, 2026
030f125
Fix sanity_checks H5 key lookup for group/period structure
baogorek Mar 2, 2026
6300674
Fix stage-h5s: add ::main entrypoint to modal run
baogorek Mar 2, 2026
51bcf4f
Add validation job to publish workflow, promote target, fix OOM
baogorek Mar 2, 2026
eb87a6b
after county acknowledgement
baogorek Mar 3, 2026
9c632c6
Add provenance tracking, fix takeup rerandomization order, improve Mo…
baogorek Mar 3, 2026
27dda62
Add national H5 pipeline, remove --prebuilt-matrices flag
baogorek Mar 3, 2026
aafbb07
Fix JSON serialization crash: __version__ resolved to module
baogorek Mar 3, 2026
75eefb5
Update upload_local_area_file docstring to list all subdirectories
baogorek Mar 3, 2026
72d7711
Age-only target config for national H5 experiment, fix national builder
baogorek Mar 4, 2026
d38e699
late night work
baogorek Mar 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions .github/bump_version.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
"""Infer semver bump from towncrier fragment types and update version."""

import re
import sys
from pathlib import Path


def get_current_version(pyproject_path: Path) -> str:
text = pyproject_path.read_text()
match = re.search(r'^version\s*=\s*"(\d+\.\d+\.\d+)"', text, re.MULTILINE)
if not match:
print(
"Could not find version in pyproject.toml",
file=sys.stderr,
)
sys.exit(1)
return match.group(1)


def infer_bump(changelog_dir: Path) -> str:
fragments = [
f
for f in changelog_dir.iterdir()
if f.is_file() and f.name != ".gitkeep"
]
if not fragments:
print("No changelog fragments found", file=sys.stderr)
sys.exit(1)

categories = {f.suffix.lstrip(".") for f in fragments}
for f in fragments:
parts = f.stem.split(".")
if len(parts) >= 2:
categories.add(parts[-1])

if "breaking" in categories:
return "major"
if "added" in categories or "removed" in categories:
return "minor"
return "patch"


def bump_version(version: str, bump: str) -> str:
major, minor, patch = (int(x) for x in version.split("."))
if bump == "major":
return f"{major + 1}.0.0"
elif bump == "minor":
return f"{major}.{minor + 1}.0"
else:
return f"{major}.{minor}.{patch + 1}"


def update_file(path: Path, old_version: str, new_version: str):
text = path.read_text()
updated = text.replace(
f'version = "{old_version}"',
f'version = "{new_version}"',
)
if updated != text:
path.write_text(updated)
print(f" Updated {path}")


def main():
root = Path(__file__).resolve().parent.parent
pyproject = root / "pyproject.toml"
changelog_dir = root / "changelog.d"

current = get_current_version(pyproject)
bump = infer_bump(changelog_dir)
new = bump_version(current, bump)

print(f"Version: {current} -> {new} ({bump})")

update_file(pyproject, current, new)


if __name__ == "__main__":
main()
7 changes: 0 additions & 7 deletions .github/check-changelog-entry.sh

This file was deleted.

65 changes: 60 additions & 5 deletions .github/workflows/local_area_publish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ on:
push:
branches: [main]
paths:
- 'policyengine_us_data/datasets/cps/local_area_calibration/**'
- 'policyengine_us_data/calibration/**'
- '.github/workflows/local_area_publish.yaml'
- 'modal_app/**'
repository_dispatch:
Expand All @@ -23,7 +23,7 @@ on:
type: boolean

# Trigger strategy:
# 1. Automatic: Code changes to local_area_calibration/ pushed to main
# 1. Automatic: Code changes to calibration/ pushed to main
# 2. repository_dispatch: Calibration workflow triggers after uploading new weights
# 3. workflow_dispatch: Manual trigger with optional parameters

Expand Down Expand Up @@ -55,7 +55,7 @@ jobs:
SKIP_UPLOAD="${{ github.event.inputs.skip_upload || 'false' }}"
BRANCH="${{ github.head_ref || github.ref_name }}"

CMD="modal run modal_app/local_area.py --branch=${BRANCH} --num-workers=${NUM_WORKERS}"
CMD="modal run modal_app/local_area.py::main --branch=${BRANCH} --num-workers=${NUM_WORKERS}"

if [ "$SKIP_UPLOAD" = "true" ]; then
CMD="${CMD} --skip-upload"
Expand All @@ -71,5 +71,60 @@ jobs:
echo "" >> $GITHUB_STEP_SUMMARY
echo "Files have been uploaded to GCS and staged on HuggingFace." >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "### Next step: Promote to production" >> $GITHUB_STEP_SUMMARY
echo "Trigger the **Promote Local Area H5 Files** workflow with the version from the build output." >> $GITHUB_STEP_SUMMARY
echo "### Next step: Validation runs automatically" >> $GITHUB_STEP_SUMMARY
echo "The validate-staging job will now check all staged H5s." >> $GITHUB_STEP_SUMMARY

validate-staging:
needs: publish-local-area
runs-on: ubuntu-latest
env:
HUGGING_FACE_TOKEN: ${{ secrets.HUGGING_FACE_TOKEN }}
steps:
- name: Checkout repo
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.13'

- name: Set up uv
uses: astral-sh/setup-uv@v5

- name: Install dependencies
run: uv sync

- name: Validate staged H5s
run: |
uv run python -m policyengine_us_data.calibration.validate_staging \
--area-type states --output validation_results.csv

- name: Upload validation results to HF
run: |
uv run python -c "
from policyengine_us_data.utils.huggingface import upload
upload('validation_results.csv',
'policyengine/policyengine-us-data',
'calibration/logs/validation_results.csv')
"

- name: Post validation summary
if: always()
run: |
echo "## Validation Results" >> $GITHUB_STEP_SUMMARY
if [ -f validation_results.csv ]; then
TOTAL=$(tail -n +2 validation_results.csv | wc -l)
FAILS=$(grep -c ',FAIL,' validation_results.csv || true)
echo "- **${TOTAL}** targets validated" >> $GITHUB_STEP_SUMMARY
echo "- **${FAILS}** sanity failures" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "Review in dashboard, then trigger **Promote** workflow." >> $GITHUB_STEP_SUMMARY
else
echo "Validation did not produce output." >> $GITHUB_STEP_SUMMARY
fi

- name: Upload validation artifact
uses: actions/upload-artifact@v4
with:
name: validation-results
path: validation_results.csv
29 changes: 10 additions & 19 deletions .github/workflows/pr_changelog.yaml
Original file line number Diff line number Diff line change
@@ -1,30 +1,21 @@
name: Changelog entry

on:
pull_request:
branches: [main]

jobs:
check-fork:
check-changelog:
name: Check changelog fragment
runs-on: ubuntu-latest
steps:
- name: Check if PR is from fork
- uses: actions/checkout@v4
- name: Check for changelog fragment
run: |
if [ "${{ github.event.pull_request.head.repo.full_name }}" != "${{ github.repository }}" ]; then
echo "❌ ERROR: This PR is from a fork repository."
echo "PRs must be created from branches in the main PolicyEngine/policyengine-us-data repository."
echo "Please close this PR and create a new one following these steps:"
echo "1. git checkout main"
echo "2. git pull upstream main"
echo "3. git checkout -b your-branch-name"
echo "4. git push -u upstream your-branch-name"
echo "5. Create PR from the upstream branch"
FRAGMENTS=$(find changelog.d -type f ! -name '.gitkeep' | wc -l)
if [ "$FRAGMENTS" -eq 0 ]; then
echo "::error::No changelog fragment found in changelog.d/"
echo "Add one with: echo 'Description.' > changelog.d/\$(git branch --show-current).<type>.md"
echo "Types: added, changed, fixed, removed, breaking"
exit 1
fi
echo "✅ PR is from the correct repository"

require-entry:
needs: check-fork
uses: ./.github/workflows/reusable_changelog_check.yaml
with:
require_entry: true
validate_format: true
45 changes: 0 additions & 45 deletions .github/workflows/reusable_changelog_check.yaml

This file was deleted.

15 changes: 9 additions & 6 deletions .github/workflows/versioning.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:
- main

paths:
- changelog_entry.yaml
- "changelog.d/**"
- "!pyproject.toml"

jobs:
Expand All @@ -19,20 +19,23 @@ jobs:
uses: actions/checkout@v4
with:
token: ${{ secrets.POLICYENGINE_GITHUB }}
fetch-depth: 0
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Install uv
uses: astral-sh/setup-uv@v5
- name: Build changelog
run: pip install yaml-changelog && make changelog
- name: Install towncrier
run: pip install towncrier
- name: Bump version and build changelog
run: |
python .github/bump_version.py
towncrier build --yes --version $(python -c "import re; print(re.search(r'version = \"(.+?)\"', open('pyproject.toml').read()).group(1))")
- name: Update lockfile
run: uv lock
- name: Preview changelog update
run: ".github/get-changelog-diff.sh"
- name: Update changelog
uses: EndBug/add-and-commit@v9
with:
add: "."
message: Update package version
message: Update package version
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,12 @@ docs/.ipynb_checkpoints/
## ACA PTC state-level uprating factors
!policyengine_us_data/storage/aca_ptc_multipliers_2022_2024.csv

## Raw input cache for database pipeline
policyengine_us_data/storage/calibration/raw_inputs/
## Calibration run outputs (weights, diagnostics, packages, config)
policyengine_us_data/storage/calibration/

## Batch processing checkpoints
completed_*.txt

## Test fixtures
!policyengine_us_data/tests/test_local_area_calibration/test_fixture_50hh.h5
!policyengine_us_data/tests/test_calibration/test_fixture_50hh.h5
oregon_ctc_analysis.py
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
## [1.70.0] - 2026-02-26

### Added

- Add end-to-end test for calibration database build pipeline.


## [1.69.4] - 2026-02-24

### Changed

- Migrated from changelog_entry.yaml to towncrier fragments to eliminate merge conflicts.


# Changelog

All notable changes to this project will be documented in this file.
Expand Down
Loading