Skip to content

fix(cll): resolve column lineage for recursive CTEs#1188

Merged
wcchang1115 merged 5 commits intomainfrom
feature/drc-2890-cll-recursive-cte-lineage
Mar 6, 2026
Merged

fix(cll): resolve column lineage for recursive CTEs#1188
wcchang1115 merged 5 commits intomainfrom
feature/drc-2890-cll-recursive-cte-lineage

Conversation

@danyelf
Copy link
Contributor

@danyelf danyelf commented Mar 6, 2026

Summary

  • Fix recursive CTEs (WITH RECURSIVE) showing "unknown" lineage for all columns instead of tracing back to source tables
  • Fix CllColumnDep positional-arg crash that caused the entire model to degrade to "unknown" when any single column was unresolvable (e.g. correlated scalar subqueries, LATERAL joins, PIVOT, UNNEST)
  • Add _resolve_scope_cll() helper to handle sqlglot's phantom stub Scopes for self-referential CTE references

Root cause

sqlglot's traverse_scope creates a stub Scope for recursive CTE self-references that shares the same expression object as the base case but isn't yielded in the scope list. scope_cll_map.get(stub) returned None, falling through to a crash path.

Test plan

  • test_recursive_cte — full category hierarchy recursive CTE traces all columns to stg_categories
  • test_recursive_cte_m2c_propagation — WHERE clause deps from base case propagate through recursive branch
  • test_unresolvable_column_does_not_crash_entire_model — correlated scalar subquery degrades gracefully; sibling columns still trace correctly
  • All 60 existing CLL tests pass (43 unit + 16 integration + 1 new)

Closes DRC-2890
Related: DRC-2891 (audit of remaining complex SQL patterns)

🤖 Generated with Claude Code

… on unresolvable columns

Recursive CTEs (WITH RECURSIVE) showed "unknown" lineage for all columns
because sqlglot creates a stub Scope for self-referential CTE references
that isn't yielded by traverse_scope. The CLL parser couldn't find this
stub in scope_cll_map, causing the entire model to degrade to "unknown".

Changes:
- Add _resolve_scope_cll() helper that falls back to expression identity
  matching when direct scope lookup fails, resolving the stub to the real
  processed base case scope
- Apply the helper to both column lineage resolution and model-to-column
  dependency propagation
- Fix CllColumnDep positional args (Pydantic model) to use keyword args,
  preventing TypeError crashes when columns can't be resolved — now only
  the unresolvable column degrades instead of the entire model

Closes: DRC-2890

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Danyel Fisher <danyel@gmail.com>
@codecov
Copy link

codecov bot commented Mar 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
recce/util/cll.py 91.26% <100.00%> (+8.91%) ⬆️
tests/util/test_cll.py 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@danyelf
Copy link
Contributor Author

danyelf commented Mar 6, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@danyelf danyelf self-assigned this Mar 6, 2026
@danyelf danyelf requested a review from even-wei March 6, 2026 04:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves Recce’s column-level lineage (CLL) extraction so recursive CTEs (WITH RECURSIVE) and partially unresolvable columns no longer degrade lineage results to “unknown” for the whole model.

Changes:

  • Add scope-resolution fallback (_resolve_scope_cll) to handle sqlglot’s stub Scope objects created for recursive CTE self-references.
  • Fix CllColumnDep instantiation to use keyword arguments (node=, column=), avoiding positional-arg crashes.
  • Add new CLL tests covering recursive CTE column lineage and model-to-column dependency propagation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
recce/util/cll.py Adds stub-scope fallback lookup and hardens dependency collection to avoid crashes and support recursive CTE scope resolution.
tests/util/test_cll.py Adds regression tests for recursive CTE lineage and for ensuring one unresolvable column doesn’t break sibling column lineage.

gcko and others added 3 commits March 6, 2026 12:43
…mn test

Address Copilot review feedback: add missing assert_column for child_count
so the test actually guards the regression, not just implicitly checks for
no exception.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Jared Scott <jared.scott@datarecce.io>
- Remove unused _cll_column (never called anywhere)
- Fix imprecise comment about UNION ALL marking all columns as derived
- Add missing child_count assertion in unresolvable column test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Danyel Fisher <danyel@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Danyel Fisher <danyel@gmail.com>
@danyelf danyelf requested a review from wcchang1115 March 6, 2026 05:59
Copy link
Collaborator

@wcchang1115 wcchang1115 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@wcchang1115 wcchang1115 merged commit 9cd053c into main Mar 6, 2026
19 checks passed
@wcchang1115 wcchang1115 deleted the feature/drc-2890-cll-recursive-cte-lineage branch March 6, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants