diff --git a/docs/3-using-recce/data-developer.md b/docs/3-using-recce/data-developer.md new file mode 100644 index 0000000..c0e13db --- /dev/null +++ b/docs/3-using-recce/data-developer.md @@ -0,0 +1,162 @@ +--- +title: Data Developer Workflow +--- + +# Data Developer Workflow + +Validate data changes throughout your development lifecycle. This guide covers validating changes before creating a PR (dev sessions) and iterating on feedback after your PR is open. + +**Goal:** Validate data changes at every stage of development, from local work through PR merge. + +## Prerequisites + +- [x] Recce Cloud account +- [x] dbt project with CI/CD configured for Recce +- [x] Access to your data warehouse + +## Development Stages + +### Before PR: Dev Sessions + +Validate changes locally before pushing to remote. Dev sessions let you run Recce validation without creating a PR. + +#### Upload via Web UI + +1. Go to [Recce Cloud](https://cloud.reccehq.com) +2. Navigate to your project +3. Click **New Dev Session** +4. Upload your dbt artifacts: + - `target/manifest.json` + - `target/catalog.json` +5. Select your base environment for comparison + +**Expected result:** Dev session opens with lineage diff showing your changes. + +#### Upload via CLI + +Run from your dbt project directory: + +```bash +recce-cloud upload --type dev +``` + +This uploads your current `target/` artifacts and creates a dev session. + +**Required files:** + +| File | Location | Generated by | +|------|----------|--------------| +| `manifest.json` | `target/` | `dbt run`, `dbt build`, or `dbt compile` | +| `catalog.json` | `target/` | `dbt docs generate` | + +#### When to Use Dev Sessions + +- Testing changes before committing +- Validating complex refactoring locally +- Exploring impact without creating a PR +- Sharing work-in-progress with teammates + +### After PR: CI/CD Validation + +Once you push changes and open a PR, the Recce Agent validates automatically. + +#### What Happens + +1. Your CI pipeline runs `recce-cloud upload` +2. The agent compares your PR branch against the base branch +3. The agent runs validation checks based on detected changes +4. A data review summary posts to your PR + +#### Understanding the Agent Summary + +The summary includes: + +- **Change overview** - Which models changed and how +- **Impact analysis** - Downstream models affected +- **Validation results** - Schema diffs, row counts, and other checks +- **Recommendations** - Suggested actions for review + +#### Fixing Issues + +When the agent identifies issues: + +1. Review the validation results in the PR comment +2. Click **Launch Recce** to explore details in the web UI +3. Identify the root cause using lineage and data diffs +4. Make fixes in your branch +5. Push changes - the agent re-validates automatically + +#### Iterating Until Checks Pass + +Each push triggers a new validation cycle: + +1. Agent re-analyzes your changes +2. New validation results post to the PR +3. Previous results are updated (not duplicated) +4. Continue until all checks pass + +## Validation Techniques + +### Check Lineage First + +Start with lineage diff to understand your change scope: + +- Modified models highlighted in the DAG +- Downstream impact visible at a glance +- Schema changes shown per model + +### Validate Metadata + +Low-cost checks using model metadata: + +- **Schema diff** - Column additions, removals, type changes +- **Row count diff** - Record count comparison (uses warehouse metadata) + +### Validate Data + +Higher-cost checks that query your warehouse: + +- **Value diff** - Column-level match percentage +- **Profile diff** - Statistical comparison (count, distinct, min, max, avg) +- **Histogram diff** - Distribution changes for numeric columns +- **Top-K diff** - Distribution changes for categorical columns + +### Custom Queries + +For flexible validation, use query diff: + +```sql +SELECT + date_trunc('month', order_date) AS month, + SUM(amount) AS revenue +FROM {{ ref('orders') }} +GROUP BY month +ORDER BY month DESC +``` + +Add queries to your checklist for repeated use. + +## Verification + +Confirm your workflow works: + +1. Make a small model change locally +2. Generate artifacts: `dbt build && dbt docs generate` +3. Upload dev session: `recce-cloud upload --type dev` +4. Verify session appears in Recce Cloud +5. Create PR and confirm agent posts summary + +## Troubleshooting + +| Issue | Solution | +|-------|----------| +| Dev session upload fails | Check artifacts exist in `target/`; run `dbt docs generate` | +| Agent doesn't run on PR | Verify CI workflow includes `recce-cloud upload` | +| Validation results missing | Check warehouse credentials in CI secrets | +| Summary not appearing | Confirm `GITHUB_TOKEN` has PR write permissions | + +## Related + +- [Data Reviewer Workflow](data-reviewer.md) - How reviewers use Recce +- [Admin Setup](admin-setup.md) - Set up your organization +- [PR/MR Data Review](../7-cicd/pr-mr-summary.md) - Understanding agent summaries diff --git a/docs/3-using-recce/data-reviewer.md b/docs/3-using-recce/data-reviewer.md new file mode 100644 index 0000000..3aceef0 --- /dev/null +++ b/docs/3-using-recce/data-reviewer.md @@ -0,0 +1,125 @@ +--- +title: Data Reviewer Workflow +--- + +# Data Reviewer Workflow + +Review data changes in pull requests using Recce. Your admin set up Recce for your team - here's how to use it as a reviewer. + +**Goal:** Review and approve data changes in PRs with confidence. + +## Prerequisites + +- [x] Recce Cloud account (via team invitation) +- [x] Access to the project in Recce Cloud +- [x] PR with Recce validation results + +## Reviewing a PR + +### 1. Find the Data Review Summary + +When a PR modifies dbt models, the Recce Agent posts a summary comment: + +1. Open the PR in GitHub/GitLab +2. Scroll to the Recce bot comment +3. Review the summary sections + +**Expected result:** Summary shows change overview, impact analysis, and validation results. + +### 2. Understand the Summary + +The summary includes: + +| Section | What It Shows | +|---------|---------------| +| **Change Overview** | Which models changed and the type of change | +| **Impact Analysis** | Downstream models affected by the changes | +| **Validation Results** | Schema diffs, row counts, and check outcomes | +| **Recommendations** | Suggested actions based on findings | + +### 3. Explore in Recce Cloud + +For deeper investigation: + +1. Click **Launch Recce** in the PR comment (or go to Recce Cloud) +2. Select the PR session from the list +3. Explore the changes interactively + +**What you can do:** + +- View lineage diff to see affected models +- Drill into schema changes per model +- Run additional data diffs (row count, profile, value) +- Execute custom queries to investigate specific concerns + +### 4. Review Validation Results + +Check each validation result: + +- **Pass** - Change validated successfully +- **Warning** - Review recommended but not blocking +- **Fail** - Issue detected that needs attention + +For failures, click through to see: +- What was compared +- Expected vs actual results +- Specific differences found + +### 5. Approve or Request Changes + +Based on your review: + +**Approve the PR:** + +- Validation results meet expectations +- Impact scope is understood and acceptable +- No unexpected data changes + +**Request changes:** + +- Validation failures need investigation +- Impact scope is broader than expected +- Questions about specific changes + +Leave comments referencing specific validation results to help the developer address issues. + +## Common Review Scenarios + +### Schema Changes + +When columns are added, removed, or modified: + +1. Check if downstream models are affected +2. Verify the change is intentional +3. Confirm breaking changes are coordinated + +### Row Count Differences + +When record counts change: + +1. Determine if the change is expected +2. Check if filters or joins were modified +3. Verify the magnitude is reasonable + +### Performance Impact + +When models are refactored: + +1. Compare query complexity +2. Check for unintended full table scans +3. Review impact on downstream refresh times + +## Verification + +Confirm you can review PRs: + +1. Open a PR with Recce validation results +2. Find the Recce bot comment +3. Click Launch Recce to open the session +4. Navigate the lineage and view a diff result + +## Related + +- [Data Developer Workflow](data-developer.md) - How developers validate changes +- [Admin Setup](admin-setup.md) - Organization and team setup +- [Checklist](../6-collaboration/checklist.md) - Adding checks to track diff --git a/mkdocs.yml b/mkdocs.yml index c04aac2..ce8c926 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -69,6 +69,8 @@ nav: - 3-visualized-change/multi-models.md - Using Recce: - 3-using-recce/admin-setup.md + - 3-using-recce/data-developer.md + - 3-using-recce/data-reviewer.md - What the Agent Does: - 4-what-the-agent-does/index.md - 4-what-the-agent-does/automated-validation.md