-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Discovery
While building a pipeline comparison diagram, I traced through the code and calibration logs that produced the published HuggingFace district files (e.g. CA-04.h5). The production local area calibration used far fewer targets than expected.
Evidence
1. The target filter is extremely restrictive
fit_calibration_weights.py lines 105-115:
targets_df, X_sparse, household_id_mapping = builder.build_matrix(
sim,
target_filter={
"stratum_group_ids": [4, 7], # 4=SNAP households, 7=state income tax
"variables": [
"health_insurance_premiums_without_medicare_part_b",
"snap",
"state_income_tax", # Census STC state income tax collections
],
},
)This uses OR logic (sparse_matrix_builder.py line 166), selecting only targets that match stratum groups 4 or 7 OR have those three variable names.
2. Production calibration log confirms 588 targets
calibration_log_20260131_153028.csv (the 5,000-epoch production run) contains 588 targets broken down as:
| Geo level | Variable | Count |
|---|---|---|
| CD | household_count |
350 |
| State | household_count |
51 |
| State | snap |
51 |
| State | state_income_tax |
44 |
| National | person_count |
4 |
| National | snap |
1 |
| National | health_insurance_premiums_without_medicare_part_b |
1 |
| Sub-state | household_count |
86 |
| Total | 588 |
3. The ONLY CD-level variable is household_count
Out of 436 congressional districts, only 350 have a target at all, and that target is exclusively household_count. No CD-level SNAP, income, employment, demographics, or any other variable is being calibrated.
4. The weight file is confirmed byte-identical
w_district_calibration.npy (the file uploaded to HuggingFace and used by modal_app/local_area.py to publish district H5 files) is byte-identical to calibration_weights_20260131_153028.npy. This confirms the 588-target, 5,000-epoch run is the one that produced all published district files.
5. The database has 37,758 targets available
The policy_data.db contains 37,758 targets across 27 stratum groups. Only 588 were used due to the restrictive filter. 86 CDs have zero targets.
Impact
Every published district-level file on HuggingFace was produced by a calibration that only matched household counts at the CD level. CD-level income distributions, SNAP participation, tax collections, etc. are entirely uncalibrated — they reflect whatever the stratified sampling + geography assignment happened to produce.
Context
The Arnold 2026 slides (CalibrationSlide.tsx) claimed 20,994 targets (34 target groups). No evidence supports this number in any actual calibration run. Maria's PR #489 ("Supporting all calibration targets") would have expanded coverage but hasn't been merged.
The new unified pipeline (PR #516) addresses this by using all 37,749+ DB-driven targets in a single solve.
@baogorek — does this match your understanding? Am I missing a different calibration run that used more CD-level targets? The only logs and weight files I found locally point to this 588-target run.