Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 162 additions & 0 deletions docs/3-using-recce/data-developer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
---
title: Data Developer Workflow
---

# Data Developer Workflow

Validate data changes throughout your development lifecycle. This guide covers validating changes before creating a PR (dev sessions) and iterating on feedback after your PR is open.

**Goal:** Validate data changes at every stage of development, from local work through PR merge.

## Prerequisites

- [x] Recce Cloud account
- [x] dbt project with CI/CD configured for Recce
- [x] Access to your data warehouse

## Development Stages

### Before PR: Dev Sessions

Validate changes locally before pushing to remote. Dev sessions let you run Recce validation without creating a PR.

#### Upload via Web UI

1. Go to [Recce Cloud](https://cloud.reccehq.com)
2. Navigate to your project
3. Click **New Dev Session**
4. Upload your dbt artifacts:
- `target/manifest.json`
- `target/catalog.json`
5. Select your base environment for comparison

**Expected result:** Dev session opens with lineage diff showing your changes.

#### Upload via CLI

Run from your dbt project directory:

```bash
recce-cloud upload --type dev
```

This uploads your current `target/` artifacts and creates a dev session.

**Required files:**

| File | Location | Generated by |
|------|----------|--------------|
| `manifest.json` | `target/` | `dbt run`, `dbt build`, or `dbt compile` |
| `catalog.json` | `target/` | `dbt docs generate` |

#### When to Use Dev Sessions

- Testing changes before committing
- Validating complex refactoring locally
- Exploring impact without creating a PR
- Sharing work-in-progress with teammates

### After PR: CI/CD Validation

Once you push changes and open a PR, the Recce Agent validates automatically.

#### What Happens

1. Your CI pipeline runs `recce-cloud upload`
2. The agent compares your PR branch against the base branch
3. The agent runs validation checks based on detected changes
4. A data review summary posts to your PR

#### Understanding the Agent Summary

The summary includes:

- **Change overview** - Which models changed and how
- **Impact analysis** - Downstream models affected
- **Validation results** - Schema diffs, row counts, and other checks
- **Recommendations** - Suggested actions for review

#### Fixing Issues

When the agent identifies issues:

1. Review the validation results in the PR comment
2. Click **Launch Recce** to explore details in the web UI
3. Identify the root cause using lineage and data diffs
4. Make fixes in your branch
5. Push changes - the agent re-validates automatically

#### Iterating Until Checks Pass

Each push triggers a new validation cycle:

1. Agent re-analyzes your changes
2. New validation results post to the PR
3. Previous results are updated (not duplicated)
4. Continue until all checks pass

## Validation Techniques

### Check Lineage First

Start with lineage diff to understand your change scope:

- Modified models highlighted in the DAG
- Downstream impact visible at a glance
- Schema changes shown per model

### Validate Metadata

Low-cost checks using model metadata:

- **Schema diff** - Column additions, removals, type changes
- **Row count diff** - Record count comparison (uses warehouse metadata)

### Validate Data

Higher-cost checks that query your warehouse:

- **Value diff** - Column-level match percentage
- **Profile diff** - Statistical comparison (count, distinct, min, max, avg)
- **Histogram diff** - Distribution changes for numeric columns
- **Top-K diff** - Distribution changes for categorical columns

### Custom Queries

For flexible validation, use query diff:

```sql
SELECT
date_trunc('month', order_date) AS month,
SUM(amount) AS revenue
FROM {{ ref('orders') }}
GROUP BY month
ORDER BY month DESC
```

Add queries to your checklist for repeated use.

## Verification

Confirm your workflow works:

1. Make a small model change locally
2. Generate artifacts: `dbt build && dbt docs generate`
3. Upload dev session: `recce-cloud upload --type dev`
4. Verify session appears in Recce Cloud
5. Create PR and confirm agent posts summary

## Troubleshooting

| Issue | Solution |
|-------|----------|
| Dev session upload fails | Check artifacts exist in `target/`; run `dbt docs generate` |
| Agent doesn't run on PR | Verify CI workflow includes `recce-cloud upload` |
| Validation results missing | Check warehouse credentials in CI secrets |
| Summary not appearing | Confirm `GITHUB_TOKEN` has PR write permissions |

## Related

- [Data Reviewer Workflow](data-reviewer.md) - How reviewers use Recce
- [Admin Setup](admin-setup.md) - Set up your organization
- [PR/MR Data Review](../7-cicd/pr-mr-summary.md) - Understanding agent summaries
125 changes: 125 additions & 0 deletions docs/3-using-recce/data-reviewer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
title: Data Reviewer Workflow
---

# Data Reviewer Workflow

Review data changes in pull requests using Recce. Your admin set up Recce for your team - here's how to use it as a reviewer.

**Goal:** Review and approve data changes in PRs with confidence.

## Prerequisites

- [x] Recce Cloud account (via team invitation)
- [x] Access to the project in Recce Cloud
- [x] PR with Recce validation results

## Reviewing a PR

### 1. Find the Data Review Summary

When a PR modifies dbt models, the Recce Agent posts a summary comment:

1. Open the PR in GitHub/GitLab
2. Scroll to the Recce bot comment
3. Review the summary sections

**Expected result:** Summary shows change overview, impact analysis, and validation results.

### 2. Understand the Summary

The summary includes:

| Section | What It Shows |
|---------|---------------|
| **Change Overview** | Which models changed and the type of change |
| **Impact Analysis** | Downstream models affected by the changes |
| **Validation Results** | Schema diffs, row counts, and check outcomes |
| **Recommendations** | Suggested actions based on findings |

### 3. Explore in Recce Cloud

For deeper investigation:

1. Click **Launch Recce** in the PR comment (or go to Recce Cloud)
2. Select the PR session from the list
3. Explore the changes interactively

**What you can do:**

- View lineage diff to see affected models
- Drill into schema changes per model
- Run additional data diffs (row count, profile, value)
- Execute custom queries to investigate specific concerns

### 4. Review Validation Results

Check each validation result:

- **Pass** - Change validated successfully
- **Warning** - Review recommended but not blocking
- **Fail** - Issue detected that needs attention

For failures, click through to see:
- What was compared
- Expected vs actual results
- Specific differences found

### 5. Approve or Request Changes

Based on your review:

**Approve the PR:**

- Validation results meet expectations
- Impact scope is understood and acceptable
- No unexpected data changes

**Request changes:**

- Validation failures need investigation
- Impact scope is broader than expected
- Questions about specific changes

Leave comments referencing specific validation results to help the developer address issues.

## Common Review Scenarios

### Schema Changes

When columns are added, removed, or modified:

1. Check if downstream models are affected
2. Verify the change is intentional
3. Confirm breaking changes are coordinated

### Row Count Differences

When record counts change:

1. Determine if the change is expected
2. Check if filters or joins were modified
3. Verify the magnitude is reasonable

### Performance Impact

When models are refactored:

1. Compare query complexity
2. Check for unintended full table scans
3. Review impact on downstream refresh times

## Verification

Confirm you can review PRs:

1. Open a PR with Recce validation results
2. Find the Recce bot comment
3. Click Launch Recce to open the session
4. Navigate the lineage and view a diff result

## Related

- [Data Developer Workflow](data-developer.md) - How developers validate changes
- [Admin Setup](admin-setup.md) - Organization and team setup
- [Checklist](../6-collaboration/checklist.md) - Adding checks to track
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ nav:
- 3-visualized-change/multi-models.md
- Using Recce:
- 3-using-recce/admin-setup.md
- 3-using-recce/data-developer.md
- 3-using-recce/data-reviewer.md
- What the Agent Does:
- 4-what-the-agent-does/index.md
- 4-what-the-agent-does/automated-validation.md
Expand Down