fix(cll): resolve column lineage for recursive CTEs#1188
Merged
wcchang1115 merged 5 commits intomainfrom Mar 6, 2026
Merged
Conversation
… on unresolvable columns Recursive CTEs (WITH RECURSIVE) showed "unknown" lineage for all columns because sqlglot creates a stub Scope for self-referential CTE references that isn't yielded by traverse_scope. The CLL parser couldn't find this stub in scope_cll_map, causing the entire model to degrade to "unknown". Changes: - Add _resolve_scope_cll() helper that falls back to expression identity matching when direct scope lookup fails, resolving the stub to the real processed base case scope - Apply the helper to both column lineage resolution and model-to-column dependency propagation - Fix CllColumnDep positional args (Pydantic model) to use keyword args, preventing TypeError crashes when columns can't be resolved — now only the unresolvable column degrades instead of the entire model Closes: DRC-2890 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Danyel Fisher <danyel@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests.
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
Contributor
Author
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Contributor
There was a problem hiding this comment.
Pull request overview
Improves Recce’s column-level lineage (CLL) extraction so recursive CTEs (WITH RECURSIVE) and partially unresolvable columns no longer degrade lineage results to “unknown” for the whole model.
Changes:
- Add scope-resolution fallback (
_resolve_scope_cll) to handle sqlglot’s stubScopeobjects created for recursive CTE self-references. - Fix
CllColumnDepinstantiation to use keyword arguments (node=,column=), avoiding positional-arg crashes. - Add new CLL tests covering recursive CTE column lineage and model-to-column dependency propagation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
recce/util/cll.py |
Adds stub-scope fallback lookup and hardens dependency collection to avoid crashes and support recursive CTE scope resolution. |
tests/util/test_cll.py |
Adds regression tests for recursive CTE lineage and for ensuring one unresolvable column doesn’t break sibling column lineage. |
…mn test Address Copilot review feedback: add missing assert_column for child_count so the test actually guards the regression, not just implicitly checks for no exception. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Jared Scott <jared.scott@datarecce.io>
- Remove unused _cll_column (never called anywhere) - Fix imprecise comment about UNION ALL marking all columns as derived - Add missing child_count assertion in unresolvable column test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Danyel Fisher <danyel@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Danyel Fisher <danyel@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WITH RECURSIVE) showing "unknown" lineage for all columns instead of tracing back to source tablesCllColumnDeppositional-arg crash that caused the entire model to degrade to "unknown" when any single column was unresolvable (e.g. correlated scalar subqueries, LATERAL joins, PIVOT, UNNEST)_resolve_scope_cll()helper to handle sqlglot's phantom stub Scopes for self-referential CTE referencesRoot cause
sqlglot's
traverse_scopecreates a stubScopefor recursive CTE self-references that shares the sameexpressionobject as the base case but isn't yielded in the scope list.scope_cll_map.get(stub)returnedNone, falling through to a crash path.Test plan
test_recursive_cte— full category hierarchy recursive CTE traces all columns tostg_categoriestest_recursive_cte_m2c_propagation— WHERE clause deps from base case propagate through recursive branchtest_unresolvable_column_does_not_crash_entire_model— correlated scalar subquery degrades gracefully; sibling columns still trace correctlyCloses DRC-2890
Related: DRC-2891 (audit of remaining complex SQL patterns)
🤖 Generated with Claude Code