Skip to content

Local area calibration used only 1 variable (household_count) at CD level — 588 total targets #517

@MaxGhenis

Description

@MaxGhenis

Discovery

While building a pipeline comparison diagram, I traced through the code and calibration logs that produced the published HuggingFace district files (e.g. CA-04.h5). The production local area calibration used far fewer targets than expected.

Evidence

1. The target filter is extremely restrictive

fit_calibration_weights.py lines 105-115:

targets_df, X_sparse, household_id_mapping = builder.build_matrix(
    sim,
    target_filter={
        "stratum_group_ids": [4, 7],  # 4=SNAP households, 7=state income tax
        "variables": [
            "health_insurance_premiums_without_medicare_part_b",
            "snap",
            "state_income_tax",  # Census STC state income tax collections
        ],
    },
)

This uses OR logic (sparse_matrix_builder.py line 166), selecting only targets that match stratum groups 4 or 7 OR have those three variable names.

2. Production calibration log confirms 588 targets

calibration_log_20260131_153028.csv (the 5,000-epoch production run) contains 588 targets broken down as:

Geo level Variable Count
CD household_count 350
State household_count 51
State snap 51
State state_income_tax 44
National person_count 4
National snap 1
National health_insurance_premiums_without_medicare_part_b 1
Sub-state household_count 86
Total 588

3. The ONLY CD-level variable is household_count

Out of 436 congressional districts, only 350 have a target at all, and that target is exclusively household_count. No CD-level SNAP, income, employment, demographics, or any other variable is being calibrated.

4. The weight file is confirmed byte-identical

w_district_calibration.npy (the file uploaded to HuggingFace and used by modal_app/local_area.py to publish district H5 files) is byte-identical to calibration_weights_20260131_153028.npy. This confirms the 588-target, 5,000-epoch run is the one that produced all published district files.

5. The database has 37,758 targets available

The policy_data.db contains 37,758 targets across 27 stratum groups. Only 588 were used due to the restrictive filter. 86 CDs have zero targets.

Impact

Every published district-level file on HuggingFace was produced by a calibration that only matched household counts at the CD level. CD-level income distributions, SNAP participation, tax collections, etc. are entirely uncalibrated — they reflect whatever the stratified sampling + geography assignment happened to produce.

Context

The Arnold 2026 slides (CalibrationSlide.tsx) claimed 20,994 targets (34 target groups). No evidence supports this number in any actual calibration run. Maria's PR #489 ("Supporting all calibration targets") would have expanded coverage but hasn't been merged.

The new unified pipeline (PR #516) addresses this by using all 37,749+ DB-driven targets in a single solve.

@baogorek — does this match your understanding? Am I missing a different calibration run that used more CD-level targets? The only logs and weight files I found locally point to this 588-target run.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions