From 6f406a3412953ca3220744fb71eebc20a4b4279f Mon Sep 17 00:00:00 2001 From: Karen Hsieh Date: Fri, 27 Feb 2026 21:31:15 +0800 Subject: [PATCH 1/7] Add dbt Cloud setup guide Tutorial for setting up Recce Cloud when dbt runs on dbt Cloud, covering artifact retrieval via dbt Cloud API and GitHub Actions workflow configuration. Co-Authored-By: Claude Opus 4.5 --- docs/2-getting-started/dbt-cloud-setup.md | 256 ++++++++++++++++++++++ 1 file changed, 256 insertions(+) create mode 100644 docs/2-getting-started/dbt-cloud-setup.md diff --git a/docs/2-getting-started/dbt-cloud-setup.md b/docs/2-getting-started/dbt-cloud-setup.md new file mode 100644 index 0000000..ccfab97 --- /dev/null +++ b/docs/2-getting-started/dbt-cloud-setup.md @@ -0,0 +1,256 @@ +--- +title: dbt Cloud Setup +--- + +# dbt Cloud Setup + +This guide helps you set up Recce Cloud when your dbt project runs on dbt Cloud. Since dbt Cloud manages your dbt runs, you'll retrieve artifacts via the dbt Cloud API instead of generating them locally. + +## Goal + +After completing this setup, you'll have automated data validation on every pull request, with Recce comparing your PR changes against production. The workflow retrieves dbt artifacts directly from dbt Cloud and uploads them to Recce Cloud for validation. + +## Prerequisites + +- [x] **Recce Cloud account**: free trial at [cloud.reccehq.com](https://cloud.reccehq.com) +- [x] **dbt Cloud account**: with CI and CD jobs configured +- [x] **dbt Cloud API token**: with read access to job artifacts +- [x] **GitHub repository**: with admin access to add workflows and secrets +- [x] **Data warehouse**: read access for data diffing + +## How it works + +When your dbt project runs on dbt Cloud, the artifacts (`manifest.json`, `catalog.json`) are stored in dbt Cloud rather than your local environment. To use Recce, you'll: + +1. Retrieve Base artifacts from your CD job (production runs) +2. Retrieve Current artifacts from your CI job (PR runs) +3. Upload both to Recce Cloud for validation + +## Setup steps + +### 1. Enable "Generate docs on run" in dbt Cloud + +Recce requires `catalog.json` for schema comparisons. Enable documentation generation for both your CI and CD jobs in dbt Cloud. + +**For CD jobs (production):** + +1. Go to your CD job settings in dbt Cloud +2. Under **Execution settings**, enable **Generate docs on run** + +**For CI jobs (pull requests):** + +1. Go to your CI job settings in dbt Cloud +2. Under **Advanced settings**, enable **Generate docs on run** + +!!! note + Without this setting, dbt Cloud won't generate `catalog.json`, and Recce won't be able to compare schemas between environments. + +### 2. Get your dbt Cloud credentials + +Collect the following from your dbt Cloud account: + +| Credential | Where to find it | +| --- | --- | +| **Account ID** | URL when viewing any job: `cloud.getdbt.com/deploy/{ACCOUNT_ID}/projects/...` | +| **CD Job ID** | URL of your production/CD job: `...jobs/{JOB_ID}` | +| **CI Job ID** | URL of your PR/CI job: `...jobs/{JOB_ID}` | +| **API Token** | Account Settings > API Tokens > Create Service Token | + +!!! tip + Create a service token with "Job Admin" or "Member" permissions. This allows read access to job artifacts. + +### 3. Configure GitHub secrets + +Add the following secrets to your GitHub repository (Settings > Secrets and variables > Actions): + +**dbt Cloud secrets:** + +- `DBT_CLOUD_API_TOKEN` - Your dbt Cloud API token +- `DBT_CLOUD_ACCOUNT_ID` - Your dbt Cloud account ID +- `DBT_CLOUD_CD_JOB_ID` - Your production/CD job ID +- `DBT_CLOUD_CI_JOB_ID` - Your PR/CI job ID + +**Recce Cloud secrets:** + +- `RECCE_STATE_PASSWORD` - Password to encrypt state files (create any secure string) + +**Data warehouse secrets** (for data diffing): + +Add your warehouse credentials based on your adapter. For Snowflake: + +- `SNOWFLAKE_ACCOUNT` +- `SNOWFLAKE_USER` +- `SNOWFLAKE_PASSWORD` +- `SNOWFLAKE_SCHEMA` + +!!! note + `GITHUB_TOKEN` is automatically provided by GitHub Actions, no configuration needed. + +### 4. Create the GitHub Actions workflow + +Create `.github/workflows/recce-dbt-cloud.yml` with the workflow configuration. The workflow: + +1. **Retrieves Base artifacts** from your CD job run matching the PR's base commit +2. **Retrieves Current artifacts** from your CI job run for the PR's head commit +3. **Runs Recce validation** and uploads results to Recce Cloud +4. **Posts a summary comment** on the pull request + +```yaml +name: Recce with dbt Cloud + +on: + pull_request: + branches: [main] + +env: + DBT_CLOUD_API_BASE: "https://cloud.getdbt.com/api/v2/accounts/${{ secrets.DBT_CLOUD_ACCOUNT_ID }}" + DBT_CLOUD_API_TOKEN: ${{ secrets.DBT_CLOUD_API_TOKEN }} + +jobs: + recce-validation: + name: Validate PR with Recce + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + fetch-depth: 0 + + - uses: actions/setup-python@v5 + with: + python-version: "3.10" + cache: "pip" + + - name: Install dependencies + run: pip install -r requirements.txt + + - name: Retrieve Base artifacts (CD job) + env: + DBT_CLOUD_CD_JOB_ID: ${{ secrets.DBT_CLOUD_CD_JOB_ID }} + BASE_GITHUB_SHA: ${{ github.event.pull_request.base.sha }} + run: | + set -eo pipefail + CD_RUNS_URL="${DBT_CLOUD_API_BASE}/runs/?job_definition_id=${DBT_CLOUD_CD_JOB_ID}&order_by=-id" + CD_RUNS_RESPONSE=$(curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${CD_RUNS_URL}") + DBT_CLOUD_CD_RUN_ID=$(echo "${CD_RUNS_RESPONSE}" | jq -r ".data[] | select(.git_sha == \"${BASE_GITHUB_SHA}\") | .id" | head -n1) + echo "DBT_CLOUD_CD_RUN_ID=${DBT_CLOUD_CD_RUN_ID}" >> $GITHUB_ENV + mkdir -p target-base + for artifact in manifest.json catalog.json; do + ARTIFACT_URL="${DBT_CLOUD_API_BASE}/runs/${DBT_CLOUD_CD_RUN_ID}/artifacts/${artifact}" + curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${ARTIFACT_URL}" -o "target-base/${artifact}" + done + + - name: Retrieve Current artifacts (CI job) + env: + DBT_CLOUD_CI_JOB_ID: ${{ secrets.DBT_CLOUD_CI_JOB_ID }} + CURRENT_GITHUB_SHA: ${{ github.event.pull_request.head.sha }} + run: | + set -eo pipefail + CI_RUNS_URL="${DBT_CLOUD_API_BASE}/runs/?job_definition_id=${DBT_CLOUD_CI_JOB_ID}&order_by=-id" + fetch_ci_run_id() { + CI_RUNS_RESPONSE=$(curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${CI_RUNS_URL}") + echo "${CI_RUNS_RESPONSE}" | jq -r ".data[] | select(.git_sha == \"${CURRENT_GITHUB_SHA}\") | .id" | head -n1 + } + DBT_CLOUD_CI_RUN_ID=$(fetch_ci_run_id) + while [ -z "$DBT_CLOUD_CI_RUN_ID" ]; do + sleep 5 + DBT_CLOUD_CI_RUN_ID=$(fetch_ci_run_id) + done + echo "DBT_CLOUD_CI_RUN_ID=${DBT_CLOUD_CI_RUN_ID}" >> $GITHUB_ENV + CI_RUN_URL="${DBT_CLOUD_API_BASE}/runs/${DBT_CLOUD_CI_RUN_ID}/" + while true; do + CI_RUN_RESPONSE=$(curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${CI_RUN_URL}") + CI_RUN_SUCCESS=$(echo "${CI_RUN_RESPONSE}" | jq '.data.is_complete and .data.is_success') + CI_RUN_FAILED=$(echo "${CI_RUN_RESPONSE}" | jq '.data.is_complete and (.data.is_error or .data.is_cancelled)') + if $CI_RUN_SUCCESS; then + echo "CI job completed successfully." + break + elif $CI_RUN_FAILED; then + status=$(echo ${CI_RUN_RESPONSE} | jq -r '.data.status_humanized') + echo "CI job failed or was cancelled. Status: $status" + exit 1 + fi + sleep 5 + done + mkdir -p target + for artifact in manifest.json catalog.json; do + ARTIFACT_URL="${DBT_CLOUD_API_BASE}/runs/${DBT_CLOUD_CI_RUN_ID}/artifacts/${artifact}" + curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${ARTIFACT_URL}" -o "target/${artifact}" + done + + - name: Run Recce validation + env: + SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }} + SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }} + SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }} + SNOWFLAKE_SCHEMA: "PR_${{ github.event.pull_request.number }}" + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + RECCE_STATE_PASSWORD: ${{ secrets.RECCE_STATE_PASSWORD }} + run: recce run --cloud + + - name: Generate Recce summary + id: recce-summary + env: + SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }} + SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }} + SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }} + SNOWFLAKE_SCHEMA: "PR_${{ github.event.pull_request.number }}" + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + RECCE_STATE_PASSWORD: ${{ secrets.RECCE_STATE_PASSWORD }} + run: | + set -eo pipefail + recce summary --cloud > recce_summary.md + cat recce_summary.md >> $GITHUB_STEP_SUMMARY + + - name: Comment on pull request + uses: thollander/actions-comment-pull-request@v2 + with: + filePath: recce_summary.md + comment_tag: recce +``` + +### 5. Adapt for your data warehouse + +The workflow above uses Snowflake. Update the environment variables in the "Run Recce validation" and "Generate Recce summary" steps to match your warehouse configuration. + +For other warehouses, replace the Snowflake variables with your adapter's required credentials. See [Connect to Warehouse](../5-data-diffing/connect-to-warehouse.md) for adapter-specific configuration. + +## Verification + +After setting up: + +1. **Create a test PR** with a small model change +2. **Wait for dbt Cloud CI job** to complete +3. **Check GitHub Actions** - the Recce workflow should run +4. **Review the PR comment** - Recce validation summary appears +5. **Launch Recce instance** - from Recce Cloud dashboard, open the PR session + +!!! tip + If the workflow fails on the first run, check that your CD job has run on the base commit. The workflow looks for artifacts from a specific git SHA. + +## Troubleshooting + +| Issue | Solution | +| --- | --- | +| "CD run not found" | Ensure your CD job has run on the base branch commit. Try rebasing your PR to trigger a new CD run. | +| "CI job timeout" | The workflow waits for dbt Cloud CI to complete. Check if your CI job is stuck or taking longer than expected. | +| "Artifact not found" | Verify "Generate docs on run" is enabled for both CI and CD jobs. | +| "API authentication failed" | Check your `DBT_CLOUD_API_TOKEN` has correct permissions and is stored in GitHub secrets. | +| "Warehouse connection failed" | Verify warehouse credentials in GitHub secrets. Check IP whitelisting if applicable. | +| No PR comment appears | Ensure `GITHUB_TOKEN` has write permissions for pull requests. Check workflow permissions. | + +### CD job timing considerations + +The workflow retrieves Base artifacts from the CD job run that matches the PR's base commit SHA. If your CD job runs on a schedule (not on every merge), the base commit might not have artifacts available. + +**Solutions:** + +- Configure CD to run on merge to main (recommended) +- Rebase your PR to a commit that has CD artifacts +- Modify the workflow to use the latest CD run instead of commit-matched artifacts + +## Related + +- [Get Started with Recce Cloud](./start-free-with-cloud.md) - Standard setup for self-hosted dbt +- [Setup CD](../7-cicd/setup-cd.md) - CD workflow configuration +- [Setup CI](../7-cicd/setup-ci.md) - CI workflow configuration +- [Best Practices for Preparing Environments](../7-cicd/best-practices-prep-env.md) - Environment strategy guidance From 679121f39b65ab1cc64dcffe223e068818150ba6 Mon Sep 17 00:00:00 2001 From: Karen Hsieh Date: Fri, 27 Feb 2026 22:09:03 +0800 Subject: [PATCH 2/7] Add 'Choose your setup' decision tree to CI/CD section - Restructure setup guidance into nested decision tree - Question 1: Where dbt runs (self-hosted vs dbt Cloud) - If self-hosted: sub-options for GitHub Actions vs GitLab/CircleCI - Question 2: Environment complexity (simple vs advanced) - Link to dbt Cloud Setup for platform users - Link to Environment Setup for advanced configurations --- docs/2-getting-started/start-free-with-cloud.md | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/docs/2-getting-started/start-free-with-cloud.md b/docs/2-getting-started/start-free-with-cloud.md index 4edc0d0..d5a7b64 100644 --- a/docs/2-getting-started/start-free-with-cloud.md +++ b/docs/2-getting-started/start-free-with-cloud.md @@ -87,9 +87,21 @@ First, go to [cloud.reccehq.com](https://cloud.reccehq.com) and create your free ### 3. Add Recce to CI/CD -This step adds CI/CD workflow files to your repository. The agent creates these automatically. For manual setup, create and merge a PR with the templates below. +This step adds CI/CD workflow files to your repository. The web agent detects your setup and guides you through. For manual setup, use the templates below. -> **Note**: This guide uses GitHub Actions. For other CI/CD platforms, see [Setup CD](../7-cicd/setup-cd.md) and [Setup CI](../7-cicd/setup-ci.md). +#### Choose your setup + +1. How do you run dbt? + + - **You own your dbt run** + - **GitHub Actions**: Continue with this guide + - **GitLab CI, CircleCI**: See [Setup CD](../7-cicd/setup-cd.md) and [Setup CI](../7-cicd/setup-ci.md) + - **You run dbt on a platform** (dbt Cloud, Paradigms, etc.): See [dbt Cloud Setup](dbt-cloud-setup.md) + +2. How complex is your environment? + + - **Simple** (prod and dev targets): Continue with this guide. We use per-PR schemas for fast setup. To learn why, see [Environment Setup](environment-setup.md). + - **Advanced** (multiple schemas, staging environments): See [Environment Setup](environment-setup.md) #### Set Up Profile.yml From 99e1e4bc3babf451f68a1ba638eee2406890c156 Mon Sep 17 00:00:00 2001 From: Karen Hsieh Date: Fri, 27 Feb 2026 22:14:52 +0800 Subject: [PATCH 3/7] Fix nested list indentation in Choose your setup section --- docs/2-getting-started/start-free-with-cloud.md | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/docs/2-getting-started/start-free-with-cloud.md b/docs/2-getting-started/start-free-with-cloud.md index d5a7b64..9a1f27e 100644 --- a/docs/2-getting-started/start-free-with-cloud.md +++ b/docs/2-getting-started/start-free-with-cloud.md @@ -92,16 +92,14 @@ This step adds CI/CD workflow files to your repository. The web agent detects yo #### Choose your setup 1. How do you run dbt? - - - **You own your dbt run** - - **GitHub Actions**: Continue with this guide - - **GitLab CI, CircleCI**: See [Setup CD](../7-cicd/setup-cd.md) and [Setup CI](../7-cicd/setup-ci.md) - - **You run dbt on a platform** (dbt Cloud, Paradigms, etc.): See [dbt Cloud Setup](dbt-cloud-setup.md) + - **You own your dbt run** + - **GitHub Actions**: Continue with this guide + - **GitLab CI, CircleCI**: See [Setup CD](../7-cicd/setup-cd.md) and [Setup CI](../7-cicd/setup-ci.md) + - **You run dbt on a platform** (dbt Cloud, Paradime, etc.): See [dbt Cloud Setup](dbt-cloud-setup.md) 2. How complex is your environment? - - - **Simple** (prod and dev targets): Continue with this guide. We use per-PR schemas for fast setup. To learn why, see [Environment Setup](environment-setup.md). - - **Advanced** (multiple schemas, staging environments): See [Environment Setup](environment-setup.md) + - **Simple** (prod and dev targets): Continue with this guide. We use per-PR schemas for fast setup. To learn why, see [Environment Setup](environment-setup.md). + - **Advanced** (multiple schemas, staging environments): See [Environment Setup](environment-setup.md) #### Set Up Profile.yml From 7e5dec7df057c7c479658ed35131c3ba2e02b095 Mon Sep 17 00:00:00 2001 From: Karen Hsieh Date: Fri, 27 Feb 2026 22:32:15 +0800 Subject: [PATCH 4/7] Add dbt-cloud-setup to nav and update setup-cd/ci links - Add dbt Cloud Setup to mkdocs.yml navigation - Update setup-cd and setup-ci links to new location (2-getting-started/) --- docs/2-getting-started/start-free-with-cloud.md | 2 +- mkdocs.yml | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/2-getting-started/start-free-with-cloud.md b/docs/2-getting-started/start-free-with-cloud.md index 9a1f27e..5dbdc2c 100644 --- a/docs/2-getting-started/start-free-with-cloud.md +++ b/docs/2-getting-started/start-free-with-cloud.md @@ -94,7 +94,7 @@ This step adds CI/CD workflow files to your repository. The web agent detects yo 1. How do you run dbt? - **You own your dbt run** - **GitHub Actions**: Continue with this guide - - **GitLab CI, CircleCI**: See [Setup CD](../7-cicd/setup-cd.md) and [Setup CI](../7-cicd/setup-ci.md) + - **GitLab CI, CircleCI**: See [Setup CD](setup-cd.md) and [Setup CI](setup-ci.md) - **You run dbt on a platform** (dbt Cloud, Paradime, etc.): See [dbt Cloud Setup](dbt-cloud-setup.md) 2. How complex is your environment? diff --git a/mkdocs.yml b/mkdocs.yml index bee8774..10c1c93 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -54,6 +54,7 @@ nav: - Getting Started: - 2-getting-started/oss-vs-cloud.md - 2-getting-started/start-free-with-cloud.md + - dbt Cloud Setup: 2-getting-started/dbt-cloud-setup.md #- 2-getting-started/cloud-5min-tutorial.md - 2-getting-started/installation.md - Claude Plugin: 2-getting-started/claude-plugin.md From cb8241bbfa27c89ad9d538752f0e9a8f4adb3773 Mon Sep 17 00:00:00 2001 From: Karen Hsieh Date: Fri, 27 Feb 2026 23:20:55 +0800 Subject: [PATCH 5/7] Apply QA and AISEO fixes to PR2c docs - Add problem statement to dbt-cloud-setup opening - Remove redundant content between intro and goal - Spell out PR and CI/CD acronyms on first use - Fix "bolded steps" references to explicit commands - Improve image alt text for accessibility - Rename "Related" to "Next steps" with descriptive links Co-Authored-By: Claude Opus 4.5 --- docs/2-getting-started/dbt-cloud-setup.md | 26 ++++++++++--------- .../start-free-with-cloud.md | 10 +++---- 2 files changed, 19 insertions(+), 17 deletions(-) diff --git a/docs/2-getting-started/dbt-cloud-setup.md b/docs/2-getting-started/dbt-cloud-setup.md index ccfab97..7cee6a0 100644 --- a/docs/2-getting-started/dbt-cloud-setup.md +++ b/docs/2-getting-started/dbt-cloud-setup.md @@ -4,27 +4,29 @@ title: dbt Cloud Setup # dbt Cloud Setup -This guide helps you set up Recce Cloud when your dbt project runs on dbt Cloud. Since dbt Cloud manages your dbt runs, you'll retrieve artifacts via the dbt Cloud API instead of generating them locally. +When your dbt project runs on dbt Cloud, validating pull request (PR) data changes requires retrieving artifacts from the dbt Cloud API rather than generating them locally. ## Goal -After completing this setup, you'll have automated data validation on every pull request, with Recce comparing your PR changes against production. The workflow retrieves dbt artifacts directly from dbt Cloud and uploads them to Recce Cloud for validation. +After completing this tutorial, every PR triggers automated data validation. Recce compares your PR changes against production, with results visible in Recce Cloud. ## Prerequisites - [x] **Recce Cloud account**: free trial at [cloud.reccehq.com](https://cloud.reccehq.com) -- [x] **dbt Cloud account**: with CI and CD jobs configured +- [x] **dbt Cloud account**: with CI (continuous integration) and CD (continuous deployment) jobs configured - [x] **dbt Cloud API token**: with read access to job artifacts - [x] **GitHub repository**: with admin access to add workflows and secrets - [x] **Data warehouse**: read access for data diffing -## How it works +## How Recce retrieves dbt Cloud artifacts -When your dbt project runs on dbt Cloud, the artifacts (`manifest.json`, `catalog.json`) are stored in dbt Cloud rather than your local environment. To use Recce, you'll: +Recce needs both base (production) and current (PR) dbt artifacts to compare changes. When using dbt Cloud, these artifacts live in dbt Cloud's API rather than your local filesystem. Your GitHub Actions workflow retrieves them via API calls before running Recce validation. -1. Retrieve Base artifacts from your CD job (production runs) -2. Retrieve Current artifacts from your CI job (PR runs) -3. Upload both to Recce Cloud for validation +The workflow: + +1. Retrieves Base artifacts from your CD job (production deployments that run on merge to main) +2. Retrieves Current artifacts from your CI job (PR-triggered builds that validate changes) +3. Uploads both to Recce Cloud for validation ## Setup steps @@ -248,9 +250,9 @@ The workflow retrieves Base artifacts from the CD job run that matches the PR's - Rebase your PR to a commit that has CD artifacts - Modify the workflow to use the latest CD run instead of commit-matched artifacts -## Related +## Next steps - [Get Started with Recce Cloud](./start-free-with-cloud.md) - Standard setup for self-hosted dbt -- [Setup CD](../7-cicd/setup-cd.md) - CD workflow configuration -- [Setup CI](../7-cicd/setup-ci.md) - CI workflow configuration -- [Best Practices for Preparing Environments](../7-cicd/best-practices-prep-env.md) - Environment strategy guidance +- [Configure CD to establish your production baseline](../7-cicd/setup-cd.md) +- [Configure CI for automated PR validation](../7-cicd/setup-ci.md) +- [Learn environment strategies for reliable comparisons](../7-cicd/best-practices-prep-env.md) diff --git a/docs/2-getting-started/start-free-with-cloud.md b/docs/2-getting-started/start-free-with-cloud.md index 5dbdc2c..996c354 100644 --- a/docs/2-getting-started/start-free-with-cloud.md +++ b/docs/2-getting-started/start-free-with-cloud.md @@ -12,7 +12,7 @@ This tutorial helps analytics engineers and data engineers set up Recce Cloud to ## Goal -Reviewing data changes in PRs is error-prone without visibility into downstream impact. After setup, the Recce agent reviews your data changes on every PR—showing what changed and what it affects. +Reviewing data changes in pull requests (PRs) is error-prone without visibility into downstream impact. After completing this tutorial, the Recce agent reviews your data changes on every PR—showing what changed and what it affects. To validate changes, Recce compares **Base** vs **Current** environments: @@ -22,7 +22,7 @@ To validate changes, Recce compares **Base** vs **Current** environments: Recce requires dbt artifacts from both environments. This guide covers: - dbt profile configuration for Base and Current -- CI/CD workflow setup +- CI/CD (Continuous Integration/Continuous Deployment) workflow setup For accurate comparisons, both environments should use consistent data ranges. See [Best Practices for Preparing Environments](../7-cicd/best-practices-prep-env.md) for environment strategies. @@ -225,7 +225,7 @@ This sample workflow: - **Calls `dbt docs generate`** to generate artifacts - **Calls `recce-cloud upload --type prod`** to upload the Base metadata, using `GITHUB_TOKEN` for authentication -To integrate into your own configuration, ensure your workflow includes the bolded steps. +To integrate into your own configuration, ensure your workflow calls `dbt docs generate` and `recce-cloud upload --type prod`. #### Set Up Current Metadata Updates @@ -293,7 +293,7 @@ This sample workflow: - **Calls `dbt docs generate --target ci`** to generate artifacts for the PR branch - **Calls `recce-cloud upload`** to upload the Current metadata, using `GITHUB_TOKEN` for authentication -To integrate into your own configuration, ensure your workflow includes the bolded steps. +To integrate into your own configuration, ensure your workflow calls `dbt docs generate --target ci` and `recce-cloud upload`. ### 4. Merge the CI/CD change @@ -309,7 +309,7 @@ In Recce Cloud, verify you see: - Production Metadata: Updated automatically - PR Sessions: all open PRs appear in the list. Only PRs with uploaded metadata can be launched for review. -![Recce Cloud dashboard after setup](../assets/images/2-getting-started/cloud-onboarding-completed.png){: .shadow} +![Recce Cloud dashboard showing connected GitHub integration, warehouse connection, and production metadata status](../assets/images/2-getting-started/cloud-onboarding-completed.png){: .shadow} ### 5. Final Steps From 35c70d7c4450ad79fa133edcc8ef8aad52a86eb0 Mon Sep 17 00:00:00 2001 From: Karen Hsieh Date: Mon, 2 Mar 2026 11:00:43 +0800 Subject: [PATCH 6/7] docs: fix terminology - use PR instead of pull request Co-Authored-By: Claude Opus 4.5 --- docs/2-getting-started/dbt-cloud-setup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/2-getting-started/dbt-cloud-setup.md b/docs/2-getting-started/dbt-cloud-setup.md index 7cee6a0..596ce0d 100644 --- a/docs/2-getting-started/dbt-cloud-setup.md +++ b/docs/2-getting-started/dbt-cloud-setup.md @@ -4,7 +4,7 @@ title: dbt Cloud Setup # dbt Cloud Setup -When your dbt project runs on dbt Cloud, validating pull request (PR) data changes requires retrieving artifacts from the dbt Cloud API rather than generating them locally. +When your dbt project runs on dbt Cloud, validating PR data changes requires retrieving artifacts from the dbt Cloud API rather than generating them locally. ## Goal From 3afff2f912bd39f73f1e07e2be83a0bb1c576ade Mon Sep 17 00:00:00 2001 From: Karen Hsieh Date: Tue, 3 Mar 2026 12:35:33 +0800 Subject: [PATCH 7/7] docs: simplify dbt Cloud setup - remove warehouse, use recce-cloud upload Based on review feedback: - Remove warehouse connection requirement (runs in Recce Cloud) - Remove "Run Recce validation", "Generate Recce summary", "Comment on PR" steps - Split into two workflows: base (CD) and PR (CI) - Use recce-cloud upload --type prod for base - Use recce-cloud upload for PR Co-Authored-By: Claude Opus 4.5 --- docs/2-getting-started/dbt-cloud-setup.md | 167 ++++++++++------------ 1 file changed, 74 insertions(+), 93 deletions(-) diff --git a/docs/2-getting-started/dbt-cloud-setup.md b/docs/2-getting-started/dbt-cloud-setup.md index 596ce0d..9164684 100644 --- a/docs/2-getting-started/dbt-cloud-setup.md +++ b/docs/2-getting-started/dbt-cloud-setup.md @@ -16,17 +16,15 @@ After completing this tutorial, every PR triggers automated data validation. Rec - [x] **dbt Cloud account**: with CI (continuous integration) and CD (continuous deployment) jobs configured - [x] **dbt Cloud API token**: with read access to job artifacts - [x] **GitHub repository**: with admin access to add workflows and secrets -- [x] **Data warehouse**: read access for data diffing ## How Recce retrieves dbt Cloud artifacts -Recce needs both base (production) and current (PR) dbt artifacts to compare changes. When using dbt Cloud, these artifacts live in dbt Cloud's API rather than your local filesystem. Your GitHub Actions workflow retrieves them via API calls before running Recce validation. +Recce needs both base (production) and current (PR) dbt artifacts to compare changes. When using dbt Cloud, these artifacts live in dbt Cloud's API rather than your local filesystem. Your GitHub Actions workflows retrieve them via API calls and upload to Recce Cloud. -The workflow: +Two workflows handle this: -1. Retrieves Base artifacts from your CD job (production deployments that run on merge to main) -2. Retrieves Current artifacts from your CI job (PR-triggered builds that validate changes) -3. Uploads both to Recce Cloud for validation +1. **Base workflow** (on merge to main): Downloads production artifacts from your CD job → uploads with `recce-cloud upload --type prod` +2. **PR workflow** (on pull request): Downloads PR artifacts from your CI job → uploads with `recce-cloud upload` ## Setup steps @@ -72,76 +70,87 @@ Add the following secrets to your GitHub repository (Settings > Secrets and vari - `DBT_CLOUD_CD_JOB_ID` - Your production/CD job ID - `DBT_CLOUD_CI_JOB_ID` - Your PR/CI job ID -**Recce Cloud secrets:** - -- `RECCE_STATE_PASSWORD` - Password to encrypt state files (create any secure string) - -**Data warehouse secrets** (for data diffing): - -Add your warehouse credentials based on your adapter. For Snowflake: - -- `SNOWFLAKE_ACCOUNT` -- `SNOWFLAKE_USER` -- `SNOWFLAKE_PASSWORD` -- `SNOWFLAKE_SCHEMA` - !!! note `GITHUB_TOKEN` is automatically provided by GitHub Actions, no configuration needed. -### 4. Create the GitHub Actions workflow - -Create `.github/workflows/recce-dbt-cloud.yml` with the workflow configuration. The workflow: +### 4. Create the base workflow (CD) -1. **Retrieves Base artifacts** from your CD job run matching the PR's base commit -2. **Retrieves Current artifacts** from your CI job run for the PR's head commit -3. **Runs Recce validation** and uploads results to Recce Cloud -4. **Posts a summary comment** on the pull request +Create `.github/workflows/recce-base.yml` to update your production baseline when merging to main. ```yaml -name: Recce with dbt Cloud +name: Update Base Metadata (dbt Cloud) on: - pull_request: + push: branches: [main] + workflow_dispatch: env: DBT_CLOUD_API_BASE: "https://cloud.getdbt.com/api/v2/accounts/${{ secrets.DBT_CLOUD_ACCOUNT_ID }}" DBT_CLOUD_API_TOKEN: ${{ secrets.DBT_CLOUD_API_TOKEN }} jobs: - recce-validation: - name: Validate PR with Recce + update-base: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - with: - fetch-depth: 0 - uses: actions/setup-python@v5 with: python-version: "3.10" - cache: "pip" - - name: Install dependencies - run: pip install -r requirements.txt + - name: Install recce-cloud + run: pip install recce-cloud - - name: Retrieve Base artifacts (CD job) + - name: Retrieve artifacts from CD job env: DBT_CLOUD_CD_JOB_ID: ${{ secrets.DBT_CLOUD_CD_JOB_ID }} - BASE_GITHUB_SHA: ${{ github.event.pull_request.base.sha }} run: | set -eo pipefail - CD_RUNS_URL="${DBT_CLOUD_API_BASE}/runs/?job_definition_id=${DBT_CLOUD_CD_JOB_ID}&order_by=-id" + CD_RUNS_URL="${DBT_CLOUD_API_BASE}/runs/?job_definition_id=${DBT_CLOUD_CD_JOB_ID}&order_by=-id&limit=1" CD_RUNS_RESPONSE=$(curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${CD_RUNS_URL}") - DBT_CLOUD_CD_RUN_ID=$(echo "${CD_RUNS_RESPONSE}" | jq -r ".data[] | select(.git_sha == \"${BASE_GITHUB_SHA}\") | .id" | head -n1) - echo "DBT_CLOUD_CD_RUN_ID=${DBT_CLOUD_CD_RUN_ID}" >> $GITHUB_ENV - mkdir -p target-base + DBT_CLOUD_CD_RUN_ID=$(echo "${CD_RUNS_RESPONSE}" | jq -r ".data[0].id") + mkdir -p target for artifact in manifest.json catalog.json; do ARTIFACT_URL="${DBT_CLOUD_API_BASE}/runs/${DBT_CLOUD_CD_RUN_ID}/artifacts/${artifact}" - curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${ARTIFACT_URL}" -o "target-base/${artifact}" + curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${ARTIFACT_URL}" -o "target/${artifact}" done - - name: Retrieve Current artifacts (CI job) + - name: Upload to Recce Cloud + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + run: recce-cloud upload --type prod +``` + +### 5. Create the PR workflow (CI) + +Create `.github/workflows/recce-pr.yml` to validate PR changes. + +```yaml +name: Validate PR (dbt Cloud) + +on: + pull_request: + branches: [main] + +env: + DBT_CLOUD_API_BASE: "https://cloud.getdbt.com/api/v2/accounts/${{ secrets.DBT_CLOUD_ACCOUNT_ID }}" + DBT_CLOUD_API_TOKEN: ${{ secrets.DBT_CLOUD_API_TOKEN }} + +jobs: + validate-pr: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - uses: actions/setup-python@v5 + with: + python-version: "3.10" + + - name: Install recce-cloud + run: pip install recce-cloud + + - name: Wait for dbt Cloud CI job env: DBT_CLOUD_CI_JOB_ID: ${{ secrets.DBT_CLOUD_CI_JOB_ID }} CURRENT_GITHUB_SHA: ${{ github.event.pull_request.head.sha }} @@ -154,7 +163,8 @@ jobs: } DBT_CLOUD_CI_RUN_ID=$(fetch_ci_run_id) while [ -z "$DBT_CLOUD_CI_RUN_ID" ]; do - sleep 5 + echo "Waiting for dbt Cloud CI job to start..." + sleep 10 DBT_CLOUD_CI_RUN_ID=$(fetch_ci_run_id) done echo "DBT_CLOUD_CI_RUN_ID=${DBT_CLOUD_CI_RUN_ID}" >> $GITHUB_ENV @@ -164,70 +174,44 @@ jobs: CI_RUN_SUCCESS=$(echo "${CI_RUN_RESPONSE}" | jq '.data.is_complete and .data.is_success') CI_RUN_FAILED=$(echo "${CI_RUN_RESPONSE}" | jq '.data.is_complete and (.data.is_error or .data.is_cancelled)') if $CI_RUN_SUCCESS; then - echo "CI job completed successfully." + echo "dbt Cloud CI job completed successfully." break elif $CI_RUN_FAILED; then status=$(echo ${CI_RUN_RESPONSE} | jq -r '.data.status_humanized') - echo "CI job failed or was cancelled. Status: $status" + echo "dbt Cloud CI job failed or was cancelled. Status: $status" exit 1 fi - sleep 5 + echo "Waiting for dbt Cloud CI job to complete..." + sleep 10 done + + - name: Retrieve artifacts from CI job + run: | + set -eo pipefail mkdir -p target for artifact in manifest.json catalog.json; do ARTIFACT_URL="${DBT_CLOUD_API_BASE}/runs/${DBT_CLOUD_CI_RUN_ID}/artifacts/${artifact}" curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${ARTIFACT_URL}" -o "target/${artifact}" done - - name: Run Recce validation - env: - SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }} - SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }} - SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }} - SNOWFLAKE_SCHEMA: "PR_${{ github.event.pull_request.number }}" - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - RECCE_STATE_PASSWORD: ${{ secrets.RECCE_STATE_PASSWORD }} - run: recce run --cloud - - - name: Generate Recce summary - id: recce-summary + - name: Upload to Recce Cloud env: - SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }} - SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }} - SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }} - SNOWFLAKE_SCHEMA: "PR_${{ github.event.pull_request.number }}" GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - RECCE_STATE_PASSWORD: ${{ secrets.RECCE_STATE_PASSWORD }} - run: | - set -eo pipefail - recce summary --cloud > recce_summary.md - cat recce_summary.md >> $GITHUB_STEP_SUMMARY - - - name: Comment on pull request - uses: thollander/actions-comment-pull-request@v2 - with: - filePath: recce_summary.md - comment_tag: recce + run: recce-cloud upload ``` -### 5. Adapt for your data warehouse - -The workflow above uses Snowflake. Update the environment variables in the "Run Recce validation" and "Generate Recce summary" steps to match your warehouse configuration. - -For other warehouses, replace the Snowflake variables with your adapter's required credentials. See [Connect to Warehouse](../5-data-diffing/connect-to-warehouse.md) for adapter-specific configuration. - ## Verification After setting up: -1. **Create a test PR** with a small model change -2. **Wait for dbt Cloud CI job** to complete -3. **Check GitHub Actions** - the Recce workflow should run -4. **Review the PR comment** - Recce validation summary appears -5. **Launch Recce instance** - from Recce Cloud dashboard, open the PR session +1. **Trigger the base workflow** - Push to main or run manually to upload production baseline +2. **Create a test PR** with a small model change +3. **Wait for dbt Cloud CI job** to complete +4. **Check GitHub Actions** - the Recce PR workflow should run after dbt Cloud CI completes +5. **Open Recce Cloud** - the PR session appears with validation results !!! tip - If the workflow fails on the first run, check that your CD job has run on the base commit. The workflow looks for artifacts from a specific git SHA. + Run the base workflow first to establish your production baseline. The PR workflow compares against this baseline. ## Troubleshooting @@ -237,18 +221,15 @@ After setting up: | "CI job timeout" | The workflow waits for dbt Cloud CI to complete. Check if your CI job is stuck or taking longer than expected. | | "Artifact not found" | Verify "Generate docs on run" is enabled for both CI and CD jobs. | | "API authentication failed" | Check your `DBT_CLOUD_API_TOKEN` has correct permissions and is stored in GitHub secrets. | -| "Warehouse connection failed" | Verify warehouse credentials in GitHub secrets. Check IP whitelisting if applicable. | -| No PR comment appears | Ensure `GITHUB_TOKEN` has write permissions for pull requests. Check workflow permissions. | ### CD job timing considerations -The workflow retrieves Base artifacts from the CD job run that matches the PR's base commit SHA. If your CD job runs on a schedule (not on every merge), the base commit might not have artifacts available. +The base workflow retrieves artifacts from the latest CD job run. For accurate comparisons, ensure your dbt Cloud CD job runs on every merge to main. -**Solutions:** +If your CD job runs on a schedule: -- Configure CD to run on merge to main (recommended) -- Rebase your PR to a commit that has CD artifacts -- Modify the workflow to use the latest CD run instead of commit-matched artifacts +- The baseline may be outdated compared to the actual main branch +- Consider triggering the CD job manually before validating PRs ## Next steps