-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Background
PR #531 introduced seeded takeup re-randomization for the census-block calibration pipeline. This issue tracks the follow-up work for the 3 variables that were deferred because they require simulation output to determine the correct rate.
How seeded re-randomization works during cloning
When we clone CPS records across census blocks, each clone gets a new geographic identity (block, CD, state). But the original takeup booleans (e.g., takes_up_snap_if_eligible) were drawn once during dataset construction using the household's original state. A household cloned into a different state should have takeup draws that reflect its new geography.
We solve this with seeded_rng(variable_name, salt=block_geoid):
- Deterministic by geography: All entities in the same census block get the same RNG seed, so the same household cloned to the same block always gets the same draw. This is essential for reproducibility and for the L0 optimizer to find stable weights.
- Varying across geography: Different blocks produce different seeds, so a household cloned to rural Alabama gets different takeup draws than the same household cloned to urban California — exactly the variation calibration needs to match local-area targets.
- Independent across variables: The variable name is part of the seed, so SNAP and Medicaid draws are uncorrelated even within the same block.
For the 8 "simple" takeup variables (flat rate or state-specific rate), re-randomization is straightforward: look up the rate, draw uniform(0,1), compare. This is implemented in rerandomize_takeup() in unified_calibration.py.
What's different about these 3 variables
These variables have rates that depend on entity-level categories determined by the simulation itself:
| Variable | Entity | Rate depends on |
|---|---|---|
takes_up_eitc |
tax unit | Number of qualifying children (0, 1, 2, 3+) |
would_claim_wic |
person | WIC category (infant, child 1-4, pregnant, postpartum, breastfeeding) |
is_wic_at_nutritional_risk |
person | WIC category + existing WIC receipt status |
You can't just look up a single rate — you need to run the simulation first to classify each entity, then apply the category-specific rate, then draw.
Implementation approach
Inside _simulate_clone() (after the sim has been configured with the clone's geography and variables have been calculated):
- Calculate the category variable (e.g.,
eitc_child_count,wic_category_str) - For each unique block, seed
seeded_rng(var_name, salt=block_geoid) - Draw uniforms, look up the rate for each entity's category, compare
This means re-randomization happens per clone inside the simulation loop rather than once before it (as the simple variables do). The seeding contract is the same — same block, same variable, same draw — but the rate lookup is category-aware.
Rate sources
- EITC:
load_take_up_rate("eitc", period)returns a dict keyed by child count - WIC:
load_take_up_rate("wic", period)returns a dict keyed by WIC category string - WIC nutritional risk: Depends on both category and existing WIC receipt; rates TBD
Acceptance criteria
-
takes_up_eitcre-randomized per clone using child-count-specific rates -
would_claim_wicre-randomized per clone using WIC-category-specific rates -
is_wic_at_nutritional_riskre-randomized per clone (rate logic TBD) - Tests verify that draws are block-deterministic and category-aware
- Existing
rerandomize_takeup()tests still pass
Ref: #531