[PULL REQUEST] add logic for employment point-based xref#204
[PULL REQUEST] add logic for employment point-based xref#204bryce-sandag wants to merge 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new GIS-server SQL crosswalk that prefers EDD point-based job locations for mapping Census blocks to Series 15 MGRAs, falling back to area-overlap allocations when point data isn’t available, and wires this crosswalk into the employment aggregation pipeline.
Changes:
- Add
sql/employment/edd_land_use_split.sqlto generate block→MGRA allocations using EDD points with area-based fallback. - Update
python/employment.pyto query the new crosswalk fromGIS_ENGINEand usepct_eddvspct_areaduring LODES aggregation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| sql/employment/edd_land_use_split.sql | New SQL script to build EDD point-based + area-based fallback block→MGRA allocation percentages. |
| python/employment.py | Switches employment module to use the new crosswalk and applies EDD/area allocation logic in aggregation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
python/employment.py:175
- Commented-out debug printing left in the module adds noise and is easy to accidentally re-enable. Please remove these lines (or replace with
utils.logger.debug(...)behind a config flag if ongoing observability is needed).
con=con,
params={
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Eric-Liu-SANDAG
left a comment
There was a problem hiding this comment.
For next review, please provide a [run_id] containing a full run of the Estimates Program so I can take a look at the output data. Also provide an older [run_id] so I can compare how the employment data has changed with the new methodology
Have run production database for 2020-2023 with Can you this code to check where jobs differences happen |
Eric-Liu-SANDAG
left a comment
There was a problem hiding this comment.
Code changes, pretty much some weird formatting notes:
- I thought we had to keep the
[EMPCORE]? - Why is
SUM(SUM([jobs])) OVER (PARTITION BY [CENSUSBLOCKS].[GEOID20]) AS [block_jobs]on two lines? - I would actually prefer
FULL OUTER JOINoverFULL JOIN. I feel includingOUTERbetter distinguishes it fromLEFTorRIGHT
Otherwise, the data itself likely has some errors, as I think some of the changes I am seeing are too large for such a small methodology change. It's probable that either 181 had issues to begin with, or the changes between 181 and 186 caused the issues. See query below:
WITH [jobs] AS (
SELECT [jobs].[run_id]
,[jobs].[year]
,[jobs].[mgra]
,[jobs].[naics_code]
,[jobs].[value] jobs_edd_xref
,X.[value] AS jobs_area_xref,
[X].[value] - [jobs].[value] AS [diff]
FROM [EstimatesProgram].[outputs].[jobs]
INNER JOIN
(SELECT
[run_id]
,[year]
,[mgra]
,[naics_code]
,[value]
FROM [EstimatesProgram].[outputs].[jobs]
WHERE run_id = 181 -- area xref
) AS X
ON [jobs].[year] = X.[year]
AND [jobs].[mgra] = X.[mgra]
AND [jobs].[naics_code] = X.[naics_code]
WHERE [jobs].[run_id] = 186 -- edd based xref
AND [jobs].[value] != X.[value]
), [agg] AS (
SELECT
[year],
[cpa],
SUM([jobs_edd_xref]) AS [jobs_edd_xref],
SUM([jobs_area_xref]) AS [jobs_area_xref],
SUM([diff]) AS [diff],
CASE
WHEN SUM([jobs_edd_xref]) = 0 THEN NULL
ELSE 100.0 * SUM([diff]) / SUM([jobs_edd_xref])
END AS [pct_change]
FROM [jobs]
INNER JOIN [demographic_warehouse].[dim].[mgra_denormalize]
ON [jobs].[mgra] = [mgra_denormalize].[mgra]
AND [mgra_denormalize].[series] = 15
GROUP BY [year], [cpa]
)
SELECT *
FROM [agg]
ORDER BY ABS([pct_change]) DESCEspecially focus on Del Mar Mesa and Torrey Highlands, but in general just verify the changes are expected
Oh wow. This definitely supports our use of EDD point-based xref over area-based. |
Line length would be 90 if included on same line I can make change to include |
GregorSchroeder
left a comment
There was a problem hiding this comment.
minor stuff, looks slick with [edd_flag]
There was a problem hiding this comment.
- The way that
GIS_ENGINEis specified through the secrets file and the comment in theREADME.mdon that file actually suggest that[GeoDepot]or[SPACECORE]should be the database supplied necessitating the use of[EMPCORE]in the queries. Recommend adding[EMPCORE]back in.
| / [CENSUSBLOCKS].[Shape].STArea()) | ||
| AS [pct_area] | ||
| FROM [GeoDepot].[sde].[CENSUSBLOCKS] | ||
| LEFT JOIN [GeoDepot].[sde].[MGRA15] |
There was a problem hiding this comment.
Agree with Eric regarding more explicit JOIN labels. LEFT OUTER JOIN
| END) AS [pct_edd], | ||
| SUM(ISNULL([xref_area].[pct_area], 0)) AS [pct_area] | ||
| FROM [xref_edd] | ||
| FULL JOIN [xref_area] |
There was a problem hiding this comment.
Agree with Eric regarding more explicit JOIN labels. FULL OUTER JOIN
| [employment] * ISNULL([headquarters].[share], 1) AS [jobs], | ||
| ISNULL([headquarters].[shape], [businesses].[shape]) AS [SHAPE] | ||
| FROM [ca_edd].[businesses] | ||
| LEFT JOIN [ca_edd].[headquarters] |
There was a problem hiding this comment.
Agree with Eric regarding more explicit JOIN labels. LEFT OUTER JOIN
| [employment] * ISNULL([headquarters].[share], 1) AS [jobs], | ||
| ISNULL([headquarters].[shape], [businesses].[shape]) AS [SHAPE] | ||
| FROM [ca_edd].[businesses] | ||
| LEFT JOIN [ca_edd].[headquarters] |
There was a problem hiding this comment.
Agree with Eric regarding more explicit JOIN labels. LEFT OUTER JOIN
| SUM(CASE WHEN ISNULL([xref_edd].[block_jobs], 0) = 0 THEN 0 | ||
| ELSE 1.0 * ISNULL([xref_edd].[mgra_jobs], 0) | ||
| / ISNULL([xref_edd].[block_jobs], 0) | ||
| END) AS [pct_edd], |
There was a problem hiding this comment.
CASE statement formatting looks off
SUM(
CASE
WHEN ISNULL([xref_edd].[block_jobs], 0) = 0 THEN 0
ELSE 1.0 * ISNULL([xref_edd].[mgra_jobs], 0) / ISNULL([xref_edd].[block_jobs], 0)
END
) AS [pct_edd],| [mgra], | ||
| CASE WHEN SUM([pct_edd]) OVER (PARTITION BY [block]) > 0 | ||
| THEN [pct_edd] * 1/SUM([pct_edd]) OVER (PARTITION BY [block]) | ||
| ELSE 0 END AS [pct_edd], |
There was a problem hiding this comment.
CASE statement formatting looks off
CASE
WHEN SUM([pct_edd]) OVER (PARTITION BY [block]) > 0
THEN [pct_edd] * 1/SUM([pct_edd]) OVER (PARTITION BY [block])
ELSE 0
END AS [pct_edd],| FROM | ||
| [results] | ||
| WHERE | ||
| [mgra] IS NOT NULL |
There was a problem hiding this comment.
formatting is off
CASE
WHEN SUM([pct_edd]) OVER (PARTITION BY [block])> 0 THEN 1
ELSE 0
END AS [edd_flag]
FROM [results]
WHERE [mgra] IS NOT NULL| [mgra] | ||
| END | ||
|
|
||
| --Drop Temp table |
There was a problem hiding this comment.
probably don't need this comment at all but missing space
| @@ -96,7 +96,7 @@ def _aggregate_lodes_to_mgra( | |||
|
|
|||
There was a problem hiding this comment.
should add a paragraph below the one-liner with a high level description of how we use EDD first then fail over to area-based

Describe this pull request. What changes are being made?
Add in sql script and update logic in employment module to handle employment point-based cross reference from block to mgra where possible, otherwise default to area based cross reference
What issues does this pull request address?
close #198