Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions docs/4-what-the-agent-does/automated-validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: Automated Validation
---

# Automated Validation

Manual data validation slows down every pull request. Developers must remember which checks to run, execute them correctly, and communicate results to reviewers. The Recce Agent automates this process, running the right validation checks based on what changed in your PR.

## How It Works

When a PR is opened or updated, the Recce Agent analyzes your changes and determines what needs validation.

### 1. PR Triggers the Agent

Your CI/CD pipeline runs `recce-cloud upload` when dbt metadata is updated. This triggers the agent to analyze the changes.

### 2. Agent Analyzes Changes

The agent reads dbt artifacts from both your base branch and PR branch. It identifies:

- Which models were modified
- What schema changes occurred
- Which downstream models are affected

### 3. Agent Runs Validation

Based on the analysis, the agent executes appropriate validation checks against your warehouse:

- **Schema diff** - Detects added, removed, or modified columns
- **Row count diff** - Compares record counts between branches
- **Profile diff** - Analyzes statistical changes in column values
- **Breaking change analysis** - Identifies changes that affect downstream models

### 4. Agent Posts Summary

The agent generates a data review summary and posts it directly to your PR. Reviewers see:

- What changed and why it matters
- Validation results with pass/fail status
- Recommended actions for review

## When to Use

- **Every PR that modifies dbt models** - The agent runs automatically for all data changes
- **Large-scale refactoring** - When many models change, automated validation catches issues you might miss
- **Critical path changes** - When modifying models that power dashboards or reports
- **Continuous integration** - As part of your CI pipeline to validate every change

## Triggering Validation

You can trigger the data review summary in three ways:

1. **Automatic trigger** - Runs when `recce-cloud upload` executes in CI
2. **Manual trigger from UI** - Click the Data Review button in a PR/MR session
3. **GitHub comment** - Comment `/recce` on your GitHub PR to generate a new summary

## Related

- [Impact Analysis](impact-analysis.md) - How the agent analyzes change scope
- [PR/MR Data Review Summary](../7-cicd/pr-mr-summary.md) - Understanding the summary output
- [Setup CI](../7-cicd/setup-ci.md) - Configure automated validation
70 changes: 70 additions & 0 deletions docs/4-what-the-agent-does/impact-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
title: Impact Analysis
---

# Impact Analysis

A single column change can break dashboards, reports, and downstream models you never intended to affect. Impact analysis maps the full scope of your changes before they reach production, helping you understand exactly what will be affected.

## How It Works

The Recce Agent analyzes your changes at multiple levels to determine their true impact.

### Lineage Analysis

The agent traces dependencies through your dbt project to identify all models affected by your changes. It builds a graph of:

- **Direct dependencies** - Models that reference your modified model
- **Transitive dependencies** - Models further downstream in the lineage
- **Column-level dependencies** - Specific columns that reference modified columns

### Schema Comparison

The agent compares schemas between your base and PR branches to detect:

- Added columns
- Removed columns
- Renamed columns
- Data type changes

### Change Classification

The agent categorizes each change based on its downstream impact:

| Type | Description | Example |
|------|-------------|---------|
| **Breaking** | Affects all downstream models | Adding a filter condition, changing GROUP BY |
| **Partial breaking** | Affects only models that reference specific modified columns | Removing or renaming a column |
| **Non-breaking** | Does not affect downstream models | Adding a new column, formatting changes |

### Downstream Effects

For each modified model, the agent identifies:

- Which downstream models are affected
- Which specific columns in those models are impacted
- Whether the impact is direct or indirect

## When to Use

- **Before merging any PR** - Understand the full scope of your changes
- **During development** - Validate that changes are isolated to intended models
- **Code review** - Help reviewers understand what will be affected
- **Breaking change assessment** - Determine if coordination with downstream consumers is needed

## Example: Column Change Impact

When you modify a column like `stg_orders.status`:

1. The agent identifies that `orders` model selects this column directly (partial impact)
2. The agent detects that `customers` model uses `status` in a WHERE clause (full impact)
3. The agent traces that `customer_segments` depends on `customers` (indirect impact)

This lets you know that your seemingly simple column change affects models you may not have considered.

## Related

- [Impact Radius](../4-downstream-impacts/impact-radius.md) - Visualize affected models
- [Breaking Change Analysis](../4-downstream-impacts/breaking-change-analysis.md) - Understand change types
- [Lineage Diff](../3-visualized-change/lineage.md) - See lineage changes
- [Column-Level Lineage](../3-visualized-change/column-level-lineage.md) - Trace column dependencies
54 changes: 54 additions & 0 deletions docs/4-what-the-agent-does/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: What the Agent Does
---

# What the Recce Agent Does

Data validation for pull requests is time-consuming. You need to understand what changed, identify downstream impacts, run the right checks, and communicate findings to reviewers. The Recce Agent automates this entire workflow.

## How It Works

The Recce Agent monitors your pull requests and acts as an automated data reviewer. When you open or update a PR that modifies dbt models, the agent:

1. **Analyzes your changes** - Reads dbt artifacts and compares your branch against the base branch
2. **Identifies impact** - Traces lineage to find all affected models and columns
3. **Runs validation checks** - Executes schema diffs, row count comparisons, and other relevant checks
4. **Generates insights** - Produces a data review summary with actionable findings
5. **Posts results** - Adds the summary directly to your PR for reviewers to see

This happens automatically in your CI/CD pipeline. No manual intervention required.

## When to Use

- **Every PR with data changes** - The agent runs automatically when dbt models are modified
- **Complex refactoring** - When changes affect many models, the agent maps the full impact radius
- **Critical model updates** - When validating changes to models that power dashboards or reports
- **Team collaboration** - When reviewers need context about data changes without running Recce locally

## Agent Capabilities

The Recce Agent provides three core capabilities:

### Automated Validation

The agent determines what needs validation based on your changes and runs appropriate checks automatically. It executes schema comparisons, row count diffs, and other validation queries against your warehouse.

[Learn more about Automated Validation](automated-validation.md)

### Impact Analysis

Before running checks, the agent analyzes your model changes to understand the scope of impact. It traces column-level lineage and categorizes changes as breaking, partial breaking, or non-breaking.

[Learn more about Impact Analysis](impact-analysis.md)

### Data Review Summary

After validation completes, the agent generates a comprehensive summary that explains what changed, what was validated, and whether the changes are safe to merge.

[Learn more about the Data Review Summary](../7-cicd/pr-mr-summary.md)

## Related

- [Data Developer Workflow](../3-using-recce/data-developer.md) - How developers validate changes
- [Data Reviewer Workflow](../3-using-recce/data-reviewer.md) - How reviewers approve PRs
- [CI/CD Getting Started](../7-cicd/ci-cd-getting-started.md) - Set up automated validation
4 changes: 4 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@ nav:
- 3-visualized-change/multi-models.md
- Using Recce:
- 3-using-recce/admin-setup.md
- What the Agent Does:
- 4-what-the-agent-does/index.md
- 4-what-the-agent-does/automated-validation.md
- 4-what-the-agent-does/impact-analysis.md
- Downstream Impacts:
#- 4-downstream-impacts/metadata-first.md
- 4-downstream-impacts/impact-radius.md
Expand Down