diff --git a/docs/4-what-the-agent-does/automated-validation.md b/docs/4-what-the-agent-does/automated-validation.md new file mode 100644 index 0000000..a5b4290 --- /dev/null +++ b/docs/4-what-the-agent-does/automated-validation.md @@ -0,0 +1,61 @@ +--- +title: Automated Validation +--- + +# Automated Validation + +Manual data validation slows down every pull request. Developers must remember which checks to run, execute them correctly, and communicate results to reviewers. The Recce Agent automates this process, running the right validation checks based on what changed in your PR. + +## How It Works + +When a PR is opened or updated, the Recce Agent analyzes your changes and determines what needs validation. + +### 1. PR Triggers the Agent + +Your CI/CD pipeline runs `recce-cloud upload` when dbt metadata is updated. This triggers the agent to analyze the changes. + +### 2. Agent Analyzes Changes + +The agent reads dbt artifacts from both your base branch and PR branch. It identifies: + +- Which models were modified +- What schema changes occurred +- Which downstream models are affected + +### 3. Agent Runs Validation + +Based on the analysis, the agent executes appropriate validation checks against your warehouse: + +- **Schema diff** - Detects added, removed, or modified columns +- **Row count diff** - Compares record counts between branches +- **Profile diff** - Analyzes statistical changes in column values +- **Breaking change analysis** - Identifies changes that affect downstream models + +### 4. Agent Posts Summary + +The agent generates a data review summary and posts it directly to your PR. Reviewers see: + +- What changed and why it matters +- Validation results with pass/fail status +- Recommended actions for review + +## When to Use + +- **Every PR that modifies dbt models** - The agent runs automatically for all data changes +- **Large-scale refactoring** - When many models change, automated validation catches issues you might miss +- **Critical path changes** - When modifying models that power dashboards or reports +- **Continuous integration** - As part of your CI pipeline to validate every change + +## Triggering Validation + +You can trigger the data review summary in three ways: + +1. **Automatic trigger** - Runs when `recce-cloud upload` executes in CI +2. **Manual trigger from UI** - Click the Data Review button in a PR/MR session +3. **GitHub comment** - Comment `/recce` on your GitHub PR to generate a new summary + +## Related + +- [Impact Analysis](impact-analysis.md) - How the agent analyzes change scope +- [PR/MR Data Review Summary](../7-cicd/pr-mr-summary.md) - Understanding the summary output +- [Setup CI](../7-cicd/setup-ci.md) - Configure automated validation diff --git a/docs/4-what-the-agent-does/impact-analysis.md b/docs/4-what-the-agent-does/impact-analysis.md new file mode 100644 index 0000000..b523bf6 --- /dev/null +++ b/docs/4-what-the-agent-does/impact-analysis.md @@ -0,0 +1,70 @@ +--- +title: Impact Analysis +--- + +# Impact Analysis + +A single column change can break dashboards, reports, and downstream models you never intended to affect. Impact analysis maps the full scope of your changes before they reach production, helping you understand exactly what will be affected. + +## How It Works + +The Recce Agent analyzes your changes at multiple levels to determine their true impact. + +### Lineage Analysis + +The agent traces dependencies through your dbt project to identify all models affected by your changes. It builds a graph of: + +- **Direct dependencies** - Models that reference your modified model +- **Transitive dependencies** - Models further downstream in the lineage +- **Column-level dependencies** - Specific columns that reference modified columns + +### Schema Comparison + +The agent compares schemas between your base and PR branches to detect: + +- Added columns +- Removed columns +- Renamed columns +- Data type changes + +### Change Classification + +The agent categorizes each change based on its downstream impact: + +| Type | Description | Example | +|------|-------------|---------| +| **Breaking** | Affects all downstream models | Adding a filter condition, changing GROUP BY | +| **Partial breaking** | Affects only models that reference specific modified columns | Removing or renaming a column | +| **Non-breaking** | Does not affect downstream models | Adding a new column, formatting changes | + +### Downstream Effects + +For each modified model, the agent identifies: + +- Which downstream models are affected +- Which specific columns in those models are impacted +- Whether the impact is direct or indirect + +## When to Use + +- **Before merging any PR** - Understand the full scope of your changes +- **During development** - Validate that changes are isolated to intended models +- **Code review** - Help reviewers understand what will be affected +- **Breaking change assessment** - Determine if coordination with downstream consumers is needed + +## Example: Column Change Impact + +When you modify a column like `stg_orders.status`: + +1. The agent identifies that `orders` model selects this column directly (partial impact) +2. The agent detects that `customers` model uses `status` in a WHERE clause (full impact) +3. The agent traces that `customer_segments` depends on `customers` (indirect impact) + +This lets you know that your seemingly simple column change affects models you may not have considered. + +## Related + +- [Impact Radius](../4-downstream-impacts/impact-radius.md) - Visualize affected models +- [Breaking Change Analysis](../4-downstream-impacts/breaking-change-analysis.md) - Understand change types +- [Lineage Diff](../3-visualized-change/lineage.md) - See lineage changes +- [Column-Level Lineage](../3-visualized-change/column-level-lineage.md) - Trace column dependencies diff --git a/docs/4-what-the-agent-does/index.md b/docs/4-what-the-agent-does/index.md new file mode 100644 index 0000000..40625b3 --- /dev/null +++ b/docs/4-what-the-agent-does/index.md @@ -0,0 +1,54 @@ +--- +title: What the Agent Does +--- + +# What the Recce Agent Does + +Data validation for pull requests is time-consuming. You need to understand what changed, identify downstream impacts, run the right checks, and communicate findings to reviewers. The Recce Agent automates this entire workflow. + +## How It Works + +The Recce Agent monitors your pull requests and acts as an automated data reviewer. When you open or update a PR that modifies dbt models, the agent: + +1. **Analyzes your changes** - Reads dbt artifacts and compares your branch against the base branch +2. **Identifies impact** - Traces lineage to find all affected models and columns +3. **Runs validation checks** - Executes schema diffs, row count comparisons, and other relevant checks +4. **Generates insights** - Produces a data review summary with actionable findings +5. **Posts results** - Adds the summary directly to your PR for reviewers to see + +This happens automatically in your CI/CD pipeline. No manual intervention required. + +## When to Use + +- **Every PR with data changes** - The agent runs automatically when dbt models are modified +- **Complex refactoring** - When changes affect many models, the agent maps the full impact radius +- **Critical model updates** - When validating changes to models that power dashboards or reports +- **Team collaboration** - When reviewers need context about data changes without running Recce locally + +## Agent Capabilities + +The Recce Agent provides three core capabilities: + +### Automated Validation + +The agent determines what needs validation based on your changes and runs appropriate checks automatically. It executes schema comparisons, row count diffs, and other validation queries against your warehouse. + +[Learn more about Automated Validation](automated-validation.md) + +### Impact Analysis + +Before running checks, the agent analyzes your model changes to understand the scope of impact. It traces column-level lineage and categorizes changes as breaking, partial breaking, or non-breaking. + +[Learn more about Impact Analysis](impact-analysis.md) + +### Data Review Summary + +After validation completes, the agent generates a comprehensive summary that explains what changed, what was validated, and whether the changes are safe to merge. + +[Learn more about the Data Review Summary](../7-cicd/pr-mr-summary.md) + +## Related + +- [Data Developer Workflow](../3-using-recce/data-developer.md) - How developers validate changes +- [Data Reviewer Workflow](../3-using-recce/data-reviewer.md) - How reviewers approve PRs +- [CI/CD Getting Started](../7-cicd/ci-cd-getting-started.md) - Set up automated validation diff --git a/mkdocs.yml b/mkdocs.yml index 0c17898..c04aac2 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -69,6 +69,10 @@ nav: - 3-visualized-change/multi-models.md - Using Recce: - 3-using-recce/admin-setup.md + - What the Agent Does: + - 4-what-the-agent-does/index.md + - 4-what-the-agent-does/automated-validation.md + - 4-what-the-agent-does/impact-analysis.md - Downstream Impacts: #- 4-downstream-impacts/metadata-first.md - 4-downstream-impacts/impact-radius.md