DRC-2820: PoC scenario detection and base mode classification#2
Open
DRC-2820: PoC scenario detection and base mode classification#2
Conversation
…tion Add detection heuristic prototype that analyzes dbt manifest.json to recommend shared base vs isolated base mode for Recce CI. The core insight: projects with incremental models need isolated base to avoid false alarm row count diffs. - detect_base_mode.py: parses manifest, classifies by materialization type, event_time coverage, and project scale - compare_environments.py: compares row counts between base/current schemas to validate false alarm patterns - metrics_daily_shipments.sql: test incremental model demonstrating the data divergence scenario Validated: 51/51 tables match with shared base (no incremental), 1 mismatch (-50.7%) when incremental model present. Refs: DRC-2820 Signed-off-by: even-wei <evenwei@infuseai.io>
…lation
The root cause of false alarms is NOT "incremental models accumulate data."
It is conditional logic (is_incremental(), current_date(), target.name,
{{ this }}) that produces different SQL depending on build context. Two
environments built under different conditions run different queries against
the same source → different results → false alarm diffs.
- detect_base_mode.py: reframed signal explanations from "data
accumulation" to "conditional logic produces non-deterministic SQL";
added snapshot model detection alongside incremental
- metrics_daily_shipments.sql: added target-dependent else branch
(pg-base gets 365 days, others get 90 days) to demonstrate the
real-world conditional fork pattern seen in fct_cmab_strategy_reward
Validated: pg-base (365d) vs pg-current (90d) produces -4.8% row diff
on the conditional model, while all 50 deterministic models match.
Refs: DRC-2820
Signed-off-by: even-wei <evenwei@infuseai.io>
…ialization Add models demonstrating that false alarms are caused by non-deterministic SQL patterns (target.name, current_date), not by materialization type: - metrics_regional_revenue (table): target.name date window → -68.8% mismatch - metrics_shipping_efficiency (table): target.name branching → -68.8% mismatch - metrics_order_summary (view): target.name date window → -78.7% mismatch - metrics_daily_orders (incremental, deterministic else): 0% match — safe Detection script v2: scans raw SQL for non-deterministic patterns instead of checking materialization type. Correctly flags 4/4 problematic models, correctly marks safe incremental as safe. Zero false positives, zero false negatives. Compare script: now includes views in row count comparison. Signed-off-by: even-wei <evenwei@infuseai.io>
Prototype two detection methods for non-deterministic dbt models: 1. compiled_sql_diff.py — Compiles under two targets, normalizes schema names and batch metadata, diffs remaining SQL. Requires --full-refresh to catch incremental else branches. 2. compare_detection_approaches.py — Runs both Jinja scanning and compiled SQL diffing side-by-side, comparing accuracy against ground truth. Key findings: - Both approaches achieve 100% accuracy on dbt-tpch (73 models) - Compiled SQL diff needs --full-refresh for incremental models - Schema normalization must be precise (db.schema.table only) - dbt_batch_id/ts must be stripped as compile-time artifacts Relates to DRC-2863 Signed-off-by: even-wei <evenwei@infuseai.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
manifest.jsonto recommend shared vs isolated base modeChanges
scripts/detect_base_mode.py— Parses manifest for incremental models, event_time coverage, materialization mix, and project scale to classify the projectscripts/compare_environments.py— Compares row counts between base/current PostgreSQL schemasmodels/metrics/metrics_daily_shipments.sql— Incremental model (delete+insert) for testing the isolated base scenarioTest plan
Size
+485/-0 across 3 files
Refs: DRC-2820
🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com