Skip to content

Add category-dependent takeup re-randomization (EITC, WIC) #532

@baogorek

Description

@baogorek

Background

PR #531 introduced seeded takeup re-randomization for the census-block calibration pipeline. This issue tracks the follow-up work for the 3 variables that were deferred because they require simulation output to determine the correct rate.

How seeded re-randomization works during cloning

When we clone CPS records across census blocks, each clone gets a new geographic identity (block, CD, state). But the original takeup booleans (e.g., takes_up_snap_if_eligible) were drawn once during dataset construction using the household's original state. A household cloned into a different state should have takeup draws that reflect its new geography.

We solve this with seeded_rng(variable_name, salt=block_geoid):

  1. Deterministic by geography: All entities in the same census block get the same RNG seed, so the same household cloned to the same block always gets the same draw. This is essential for reproducibility and for the L0 optimizer to find stable weights.
  2. Varying across geography: Different blocks produce different seeds, so a household cloned to rural Alabama gets different takeup draws than the same household cloned to urban California — exactly the variation calibration needs to match local-area targets.
  3. Independent across variables: The variable name is part of the seed, so SNAP and Medicaid draws are uncorrelated even within the same block.

For the 8 "simple" takeup variables (flat rate or state-specific rate), re-randomization is straightforward: look up the rate, draw uniform(0,1), compare. This is implemented in rerandomize_takeup() in unified_calibration.py.

What's different about these 3 variables

These variables have rates that depend on entity-level categories determined by the simulation itself:

Variable Entity Rate depends on
takes_up_eitc tax unit Number of qualifying children (0, 1, 2, 3+)
would_claim_wic person WIC category (infant, child 1-4, pregnant, postpartum, breastfeeding)
is_wic_at_nutritional_risk person WIC category + existing WIC receipt status

You can't just look up a single rate — you need to run the simulation first to classify each entity, then apply the category-specific rate, then draw.

Implementation approach

Inside _simulate_clone() (after the sim has been configured with the clone's geography and variables have been calculated):

  1. Calculate the category variable (e.g., eitc_child_count, wic_category_str)
  2. For each unique block, seed seeded_rng(var_name, salt=block_geoid)
  3. Draw uniforms, look up the rate for each entity's category, compare

This means re-randomization happens per clone inside the simulation loop rather than once before it (as the simple variables do). The seeding contract is the same — same block, same variable, same draw — but the rate lookup is category-aware.

Rate sources

  • EITC: load_take_up_rate("eitc", period) returns a dict keyed by child count
  • WIC: load_take_up_rate("wic", period) returns a dict keyed by WIC category string
  • WIC nutritional risk: Depends on both category and existing WIC receipt; rates TBD

Acceptance criteria

  • takes_up_eitc re-randomized per clone using child-count-specific rates
  • would_claim_wic re-randomized per clone using WIC-category-specific rates
  • is_wic_at_nutritional_risk re-randomized per clone (rate logic TBD)
  • Tests verify that draws are block-deterministic and category-aware
  • Existing rerandomize_takeup() tests still pass

Ref: #531

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions