-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Problem
The calibration pipeline (unified_calibration.py) and the stacked dataset builder (stacked_dataset_builder.py) must use the same takeup draws for their outputs to be consistent. Currently this works only because takeup re-randomization is disabled — both read the original draws from the base .h5. But if re-randomization is enabled in calibration (via rerandomize_takeup() or #532), the two will diverge:
- Calibration: generates new takeup draws per block/clone using
seeded_rng(var_name, salt=block_geoid), builds X matrix with those draws, optimizes weights - Stacked builder: loads the original base
.h5, copies the original draws (not the re-randomized ones), saves to output.h5
The resulting weights would be optimized for takeup patterns that don't exist in the final dataset.
What needs to happen
When the stacked dataset builder assembles each CD (around line 389, after geography is set and before to_input_dataframe()), it must call the same rerandomize_takeup() logic with the same seeds used during calibration. This means:
- After setting
state_fips,county,block_geoidetc. oncd_sim - Call
rerandomize_takeup(cd_sim, block_geoids, time_period)(or equivalent) - The seeded RNG contract (
seeded_rng(var_name, salt=block_geoid)) guarantees the same draws as calibration
For the category-dependent variables (#532: EITC, WIC), the stacked builder would also need to run the simulation to determine categories before re-drawing.
Current state
--skip-takeup-rerandomizeis effectively a no-op (hardcoded skip at line 1058-1062)- The stacked builder has no re-randomization code at all
- This is self-consistent today but blocks enabling takeup re-randomization
Acceptance criteria
- Stacked dataset builder reproduces the same takeup draws used during calibration
- Both simple and category-dependent (Add category-dependent takeup re-randomization (EITC, WIC) #532) re-randomization are handled
- Test: build a single-CD
.h5, verify takeup draws match what the calibration matrix used