Skip to content

Stacked dataset builder must reproduce calibration takeup draws #547

@baogorek

Description

@baogorek

Problem

The calibration pipeline (unified_calibration.py) and the stacked dataset builder (stacked_dataset_builder.py) must use the same takeup draws for their outputs to be consistent. Currently this works only because takeup re-randomization is disabled — both read the original draws from the base .h5. But if re-randomization is enabled in calibration (via rerandomize_takeup() or #532), the two will diverge:

  • Calibration: generates new takeup draws per block/clone using seeded_rng(var_name, salt=block_geoid), builds X matrix with those draws, optimizes weights
  • Stacked builder: loads the original base .h5, copies the original draws (not the re-randomized ones), saves to output .h5

The resulting weights would be optimized for takeup patterns that don't exist in the final dataset.

What needs to happen

When the stacked dataset builder assembles each CD (around line 389, after geography is set and before to_input_dataframe()), it must call the same rerandomize_takeup() logic with the same seeds used during calibration. This means:

  1. After setting state_fips, county, block_geoid etc. on cd_sim
  2. Call rerandomize_takeup(cd_sim, block_geoids, time_period) (or equivalent)
  3. The seeded RNG contract (seeded_rng(var_name, salt=block_geoid)) guarantees the same draws as calibration

For the category-dependent variables (#532: EITC, WIC), the stacked builder would also need to run the simulation to determine categories before re-drawing.

Current state

  • --skip-takeup-rerandomize is effectively a no-op (hardcoded skip at line 1058-1062)
  • The stacked builder has no re-randomization code at all
  • This is self-consistent today but blocks enabling takeup re-randomization

Acceptance criteria

Ref: #532, #531

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions