From 5fef6c45abe06ed706627036c078f7fced32059b Mon Sep 17 00:00:00 2001 From: Karen Hsieh Date: Tue, 3 Mar 2026 20:17:07 +0800 Subject: [PATCH] Add Reference section with configuration, state file, and CLI documentation - Create 7-reference/configuration.md: Document recce.yml preset check configuration with overview, parameters, and examples for all check types - Create 7-reference/state-file.md: Document state file format, saving methods, and usage patterns for development and PR review workflows - Create 7-reference/cli-reference.md: Document recce and recce-cloud CLI commands including server, run, summary, debug, and upload - Update mkdocs.yml: Rename "Technical Concepts" to "Reference" and update navigation paths to new 7-reference section Co-Authored-By: Claude Opus 4.5 --- docs/7-reference/cli-reference.md | 296 +++++++++++++++++++++++++ docs/7-reference/configuration.md | 353 ++++++++++++++++++++++++++++++ docs/7-reference/state-file.md | 145 ++++++++++++ mkdocs.yml | 7 +- 4 files changed, 798 insertions(+), 3 deletions(-) create mode 100644 docs/7-reference/cli-reference.md create mode 100644 docs/7-reference/configuration.md create mode 100644 docs/7-reference/state-file.md diff --git a/docs/7-reference/cli-reference.md b/docs/7-reference/cli-reference.md new file mode 100644 index 0000000..8a22fda --- /dev/null +++ b/docs/7-reference/cli-reference.md @@ -0,0 +1,296 @@ +--- +title: CLI Reference +--- + +# CLI Reference + +This reference documents the command-line interfaces for Recce OSS (`recce`) and Recce Cloud (`recce-cloud`). + +## Overview + +Recce provides two CLI tools: + +- **`recce`** - The open source CLI for local data validation and diffing +- **`recce-cloud`** - The cloud CLI for uploading artifacts to Recce Cloud in CI/CD workflows + +## recce Commands + +### recce server + +Starts the Recce web server for interactive data validation. + +**Syntax:** + +```bash +recce server [OPTIONS] [STATE_FILE] +``` + +**Arguments:** + +| Argument | Description | +|----------|-------------| +| `STATE_FILE` | Optional path to a state file. If specified and exists, loads the state. If specified and does not exist, creates a new state file at that path. | + +**Options:** + +| Option | Description | +|--------|-------------| +| `--review` | Enable review mode. Uses dbt artifacts from the state file instead of `target/` and `target-base/` directories. | +| `--api-token ` | API token for Recce Cloud connection. | + +**Examples:** + +Start server with default settings: + +```bash +recce server +``` + +Start server with a state file: + +```bash +recce server my_recce_state.json +``` + +Start server in review mode (uses artifacts from state file): + +```bash +recce server --review my_recce_state.json +``` + +Start server with Recce Cloud connection: + +```bash +recce server --api-token +``` + +**Notes:** + +- The server runs on `http://localhost:8000` by default +- Requires dbt artifacts in `target/` (current) and `target-base/` (base) directories unless using `--review` mode +- State is auto-saved when the Save button is clicked in the UI + +### recce run + +Executes preset checks and saves results to a state file. + +**Syntax:** + +```bash +recce run [OPTIONS] +``` + +**Options:** + +| Option | Description | +|--------|-------------| +| `--state-file ` | Path to state file. Default: `recce_state.json` | +| `--github-pull-request-url ` | GitHub PR URL for CI context | + +**Examples:** + +Run all preset checks: + +```bash +recce run +``` + +Run checks and save to specific state file: + +```bash +recce run --state-file my_state.json +``` + +Run checks with GitHub PR context: + +```bash +recce run --github-pull-request-url ${{ github.event.pull_request.html_url }} +``` + +**Notes:** + +- Executes all checks defined in `recce.yml` +- Outputs results to the state file (default: `recce_state.json`) +- Used primarily in CI/CD pipelines for automated validation + +### recce summary + +Generates a summary report from a state file. + +**Syntax:** + +```bash +recce summary +``` + +**Arguments:** + +| Argument | Description | +|----------|-------------| +| `STATE_FILE` | Path to the state file to summarize | + +**Examples:** + +Generate summary from state file: + +```bash +recce summary recce_state.json +``` + +Generate summary and save to file: + +```bash +recce summary recce_state.json > recce_summary.md +``` + +**Notes:** + +- Outputs summary in Markdown format +- Useful for generating PR comments in CI/CD workflows + +### recce debug + +Verifies Recce configuration and environment setup. + +**Syntax:** + +```bash +recce debug +``` + +**Examples:** + +```bash +recce debug +``` + +**Notes:** + +- Checks for required artifacts in `target/` and `target-base/` directories +- Verifies warehouse connection +- Useful for troubleshooting setup issues before launching the server + +## recce-cloud Commands + +The `recce-cloud` CLI is a lightweight tool for uploading dbt artifacts to Recce Cloud in CI/CD pipelines. + +### Installation + +```bash +pip install recce-cloud +``` + +### recce-cloud upload + +Uploads dbt artifacts to Recce Cloud. + +**Syntax:** + +```bash +recce-cloud upload [OPTIONS] +``` + +**Options:** + +| Option | Description | +|--------|-------------| +| `--type ` | Session type: `prod` for baseline, omit for PR/MR auto-detection | +| `--target-path ` | Path to dbt artifacts directory. Default: `target/` | +| `--dry-run` | Test configuration without uploading | + +**Examples:** + +Upload baseline artifacts (for CD workflow): + +```bash +recce-cloud upload --type prod +``` + +Upload PR/MR artifacts (auto-detected): + +```bash +recce-cloud upload +``` + +Upload from custom artifact path: + +```bash +recce-cloud upload --target-path custom-target +``` + +Test configuration without uploading: + +```bash +recce-cloud upload --dry-run +``` + +**Notes:** + +- Automatically detects CI platform (GitHub Actions, GitLab CI) +- Uses `GITHUB_TOKEN` for GitHub authentication +- Uses `CI_JOB_TOKEN` for GitLab authentication +- Session type is auto-detected from PR/MR context when `--type` is omitted + +**Environment Variables:** + +| Platform | Variable | Description | +|----------|----------|-------------| +| GitHub | `GITHUB_TOKEN` | Authentication token (automatically available in Actions) | +| GitLab | `CI_JOB_TOKEN` | Authentication token (automatically available in CI/CD) | + +### Expected Output + +Successful upload displays: + +``` +─────────────────────────── CI Environment Detection ─────────────────────────── +Platform: github-actions +Session Type: prod +Commit SHA: abc123de... +Source Branch: main +Repository: your-org/your-repo +Info: Using GITHUB_TOKEN for platform-specific authentication +────────────────────────── Creating/touching session ─────────────────────────── +Session ID: f8b0f7ca-ea59-411d-abd8-88b80b9f87ad +Uploading manifest from path "target/manifest.json" +Uploading catalog from path "target/catalog.json" +Notifying upload completion... +──────────────────────────── Uploaded Successfully ───────────────────────────── +Uploaded dbt artifacts to Recce Cloud for session ID "f8b0f7ca-ea59-411d-abd8-88b80b9f87ad" +``` + +## Common Workflows + +### Local Development + +```bash +# Start interactive session +recce server + +# Or continue from saved state +recce server my_state.json +``` + +### CI/CD Pipeline + +```bash +# CD: Update baseline after merge to main +recce-cloud upload --type prod + +# CI: Upload PR artifacts for validation +recce-cloud upload +``` + +### Review Workflow + +```bash +# Reviewer loads state file in review mode +recce server --review recce_state.json +``` + +## Related + +- [Configuration](./configuration.md) - Preset check configuration in `recce.yml` +- [State File](./state-file.md) - State file format and usage +- [Setup CI](../7-cicd/setup-ci.md) - CI/CD integration guide +- [Setup CD](../7-cicd/setup-cd.md) - CD workflow setup diff --git a/docs/7-reference/configuration.md b/docs/7-reference/configuration.md new file mode 100644 index 0000000..157453b --- /dev/null +++ b/docs/7-reference/configuration.md @@ -0,0 +1,353 @@ +--- +title: Configuration +--- + +# Configuration + +This reference documents the `recce.yml` configuration file, which defines preset checks and their parameters for automated data validation. + +## Overview + +The config file for Recce is located in `recce.yml` in your dbt project root. Use this file to define preset checks that run automatically with `recce server` or `recce run`. + +## File Location + +| Path | Description | +|------|-------------| +| `recce.yml` | Main configuration file in dbt project root | + +## Preset Checks + +Preset checks define automated validations that execute when you run `recce server` or `recce run`. Each check specifies a type of comparison and its parameters. + +### Check Structure + +```yaml +# recce.yml +checks: + - name: Query diff of customers + description: | + This is the demo preset check. + + Please run the query and paste the screenshot to the PR comment. + type: query_diff + params: + sql_template: select * from {{ ref("customers") }} + view_options: + primary_keys: + - customer_id +``` + +### Check Fields + +| Field | Description | Type | Required | +|-------|-------------|------|----------| +| `name` | The title of the check | string | Yes | +| `description` | The description of the check | string | | +| `type` | The type of the check (see types below) | string | Yes | +| `params` | The parameters for running the check | object | Yes | +| `view_options` | The options for presenting the run result | object | | + +## Check Types + +### Row Count Diff + +Compares row counts between base and current environments. + +**Type:** `row_count_diff` + +**Parameters:** + +| Field | Description | Type | Required | +|-------|-------------|------|----------| +| `node_names` | List of node names | `string[]` | *1 | +| `node_ids` | List of node IDs | `string[]` | *1 | +| `select` | Node selection syntax. See [dbt docs](https://docs.getdbt.com/reference/node-selection/syntax) | `string` | | +| `exclude` | Node exclusion syntax. See [dbt docs](https://docs.getdbt.com/reference/node-selection/syntax) | `string` | | +| `packages` | Package filter | `string[]` | | +| `view_mode` | Quick filter for changed models | `all`, `changed_models` | | + +**Notes:** + +*1: If `node_ids` or `node_names` is specified, it will be used; otherwise, nodes will be selected using the criteria defined by `select`, `exclude`, `packages`, and `view_mode`. + +**Examples:** + +Using node selector: + +```yaml +checks: + - name: Row count for modified tables + description: Check row counts for all modified table models + type: row_count_diff + params: + select: state:modified,config.materialized:table + exclude: tag:dev +``` + +Using node names: + +```yaml +checks: + - name: Row count for key models + description: Check row counts for customers and orders + type: row_count_diff + params: + node_names: ['customers', 'orders'] +``` + +### Schema Diff + +Compares schema structure between base and current environments. + +**Type:** `schema_diff` + +**Parameters:** + +| Field | Description | Type | Required | +|-------|-------------|------|----------| +| `node_id` | The node ID or list of node IDs to check | `string[]` | *1 | +| `select` | Node selection syntax. See [dbt docs](https://docs.getdbt.com/reference/node-selection/syntax) | `string` | | +| `exclude` | Node exclusion syntax. See [dbt docs](https://docs.getdbt.com/reference/node-selection/syntax) | `string` | | +| `packages` | Package filter | `string[]` | | +| `view_mode` | Quick filter for changed models | `all`, `changed_models` | | + +**Notes:** + +*1: If `node_id` is specified, it will be used; otherwise, nodes will be selected using the criteria defined by `select`, `exclude`, `packages`, and `view_mode`. + +**Examples:** + +Using node selector: + +```yaml +checks: + - name: Schema diff for modified models + description: Check schema changes for modified models and downstream + type: schema_diff + params: + select: state:modified+ + exclude: tag:dev +``` + +Using node ID: + +```yaml +checks: + - name: Schema diff for customers + description: Check schema for customers model + type: schema_diff + params: + node_id: model.jaffle_shop.customers +``` + +### Lineage Diff + +Compares lineage structure between base and current environments. + +**Type:** `lineage_diff` + +**Parameters:** + +| Field | Description | Type | Required | +|-------|-------------|------|----------| +| `select` | Node selection syntax. See [dbt docs](https://docs.getdbt.com/reference/node-selection/syntax) | `string` | | +| `exclude` | Node exclusion syntax. See [dbt docs](https://docs.getdbt.com/reference/node-selection/syntax) | `string` | | +| `packages` | Package filter | `string[]` | | +| `view_mode` | Quick filter for changed models | `all`, `changed_models` | | + +**Examples:** + +```yaml +checks: + - name: Lineage diff for modified models + description: Check lineage changes for modified models and downstream + type: lineage_diff + params: + select: state:modified+ + exclude: tag:dev +``` + +### Query + +Executes a custom SQL query in the current environment. + +**Type:** `query` + +**Parameters:** + +| Field | Description | Type | Required | +|-------|-------------|------|----------| +| `sql_template` | SQL statement using Jinja templating | `string` | Yes | + +**Examples:** + +```yaml +checks: + - name: Customer count + description: Get total customer count + type: query + params: + sql_template: select count(*) from {{ ref("customers") }} +``` + +### Query Diff + +Compares query results between base and current environments. + +**Type:** `query_diff` + +**Parameters:** + +| Field | Description | Type | Required | +|-------|-------------|------|----------| +| `sql_template` | SQL statement using Jinja templating | `string` | Yes | +| `base_sql_template` | SQL statement for base environment (if different) | `string` | | +| `primary_keys` | Primary keys for record identification | `string[]` | *1 | + +**Notes:** + +*1: If `primary_keys` is specified, the query diff is performed in the warehouse. Otherwise, the query result (up to the first 2000 records) is returned, and the diff is executed on the client side. + +**Examples:** + +```yaml +checks: + - name: Customer data diff + description: Compare customer data between environments + type: query_diff + params: + sql_template: select * from {{ ref("customers") }} + primary_keys: + - customer_id +``` + +### Value Diff + +Compares values for a specific model between environments. + +**Type:** `value_diff` or `value_diff_detail` + +**Parameters:** + +| Field | Description | Type | Required | +|-------|-------------|------|----------| +| `model` | The name of the model | `string` | Yes | +| `primary_key` | Primary key(s) for record identification | `string` or `string[]` | Yes | +| `columns` | List of columns to include in diff | `string[]` | | + +**Examples:** + +Value diff summary: + +```yaml +checks: + - name: Customer value diff + description: Compare customer values + type: value_diff + params: + model: customers + primary_key: customer_id +``` + +Value diff with detailed rows: + +```yaml +checks: + - name: Customer value diff (detailed) + description: Compare customer values with row details + type: value_diff_detail + params: + model: customers + primary_key: customer_id +``` + +### Profile Diff + +Compares statistical profiles of a model between environments. + +**Type:** `profile_diff` + +**Parameters:** + +| Field | Description | Type | Required | +|-------|-------------|------|----------| +| `model` | The name of the model | `string` | Yes | + +**Examples:** + +```yaml +checks: + - name: Customer profile diff + description: Compare statistical profile of customers + type: profile_diff + params: + model: customers +``` + +### Histogram Diff + +Compares histogram distributions for a column between environments. + +**Type:** `histogram_diff` + +**Parameters:** + +| Field | Description | Type | Required | +|-------|-------------|------|----------| +| `model` | The name of the model | `string` | Yes | +| `column_name` | The name of the column | `string` | Yes | +| `column_type` | The type of the column | `string` | Yes | + +**Examples:** + +```yaml +checks: + - name: CLV histogram diff + description: Compare customer lifetime value distribution + type: histogram_diff + params: + model: customers + column_name: customer_lifetime_value + column_type: BIGINT +``` + +### Top-K Diff + +Compares top-K values for a column between environments. + +**Type:** `top_k_diff` + +**Parameters:** + +| Field | Description | Type | Required | +|-------|-------------|------|----------| +| `model` | The name of the model | `string` | Yes | +| `column_name` | The name of the column | `string` | Yes | +| `k` | Number of top items to include | `number` | Default: 50 | + +**Examples:** + +```yaml +checks: + - name: Top 50 customer values + description: Compare top 50 customer lifetime values + type: top_k_diff + params: + model: customers + column_name: customer_lifetime_value + k: 50 +``` + +## Default Behavior + +- Preset checks are loaded from `recce.yml` when Recce starts +- Checks execute automatically with `recce run` +- Results are stored in the state file +- View options control how results are displayed in the UI + +## Related + +- [Preset Checks Guide](../7-cicd/preset-checks.md) - How to use preset checks in workflows +- [State File](./state-file.md) - Understanding the state file format +- [CLI Reference](./cli-reference.md) - Command-line options for running checks diff --git a/docs/7-reference/state-file.md b/docs/7-reference/state-file.md new file mode 100644 index 0000000..4b36e6b --- /dev/null +++ b/docs/7-reference/state-file.md @@ -0,0 +1,145 @@ +--- +title: State File +--- + +# State File + +This reference documents the Recce state file format, which stores validation results, checks, and environment information. + +## Overview + +The state file represents the serialized state of a Recce instance. It is a JSON-formatted file containing checks, runs, environment artifacts, and runtime information. + +## File Format + +| Aspect | Details | +|--------|---------| +| Format | JSON | +| Default name | `recce_state.json` | +| Location | dbt project root | + +## Contents + +The state file contains the following information: + +- **Checks**: Data from the checks added to the checklist on the Checklist page +- **Runs**: Each diff execution in Recce corresponds to a run, similar to a query in a data warehouse. Typically, a single run submits a series of queries to the warehouse and retrieves the final results +- **Environment Artifacts**: Includes `manifest.json` and `catalog.json` files for both the base and current environments +- **Runtime Information**: Metadata such as Git branch details and pull request (PR) information from the CI runner + +## Saving the State File + +There are multiple ways to save the state file. + +### Save from Web UI + +Click the **Save** button at the top of the app. Recce will continuously write updates to the state file, effectively working like an auto-save feature, and persist the state until the Recce instance is closed. The file is saved with the specified filename in the directory where the `recce server` command is run. + +### Export from Web UI + +Click the **Export** button located in the top-right corner to download the current Recce state to any location on your machine. + +![Save and Export buttons](../assets/images/8-technical-concepts/state-file-save.png){: .shadow} + +### Start with State File + +Provide a state file as an argument when launching Recce. If the file does not exist, Recce will create a state file and start with an empty state. If the file exists, Recce will load the state and continue working from it. + +```bash +recce server my_recce_state.json +``` + +## Using the State File + +The state file can be used in several ways: + +### Continue State + +Launch Recce with the specified state file to continue from where you left off. + +```bash +recce server my_recce_state.json +``` + +### Review Mode + +Running Recce with the `--review` option enables review mode. In this mode, Recce uses the dbt artifacts in the state file instead of those in the `target/` and `target-base/` directories. This option is useful for distinguishing between development and review purposes. + +```bash +recce server --review my_recce_state.json +``` + +### Import Checklist + +To preserve favorite checks across different branches, import a checklist by clicking the **Import** button at the top of the checklist. + +### Continue from `recce run` + +Execute the checks in the specified state file. + +```bash +recce run --state-file my_recce_state.json +``` + +## Workflow Examples + +### Development Workflow + +In the development workflow, the state file acts as a session for developing a feature. It allows you to store checks to verify the diff results against the base environment. + +1. Run the recce server without a state file + + ```bash + recce server + ``` + +2. Add checks to the checklist +3. Save the state by clicking the **Save** or **Export** button +4. Resume your session by launching Recce with the specific state file + + ```bash + recce server recce_issue_1.json + ``` + +![State File For Development](../assets/images/8-technical-concepts/state-file-dev.png) + +### PR Review Workflow + +During the PR review process, the state file serves as a communication medium between the submitter and the reviewer. + +1. Start the Recce server without a state file + + ```bash + recce server + ``` + +2. Add checks to the checklist +3. Save the state by clicking the **Save** or **Export** button +4. Share the state file with the reviewer or attach it as a comment in the pull request +5. The reviewer reviews the results using the state file + + ```bash + recce server --review recce_issue_1.json + ``` + +![State File For PR Review](../assets/images/8-technical-concepts/state-file-pr.png) + +## CLI Options + +| Option | Description | +|--------|-------------| +| `recce server ` | Start server with state file | +| `recce server --review ` | Start in review mode using state file artifacts | +| `recce run --state-file ` | Run checks from state file | + +## Default Behavior + +- If no state file is specified, Recce starts with an empty state +- State files are saved to the current working directory by default +- Review mode (`--review`) uses artifacts embedded in the state file + +## Related + +- [CLI Reference](./cli-reference.md) - Command-line options +- [Configuration](./configuration.md) - Preset check configuration +- [PR Review Workflow](../7-cicd/scenario-pr-review.md) - Using state files in reviews diff --git a/mkdocs.yml b/mkdocs.yml index bee8774..95f17c3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -92,9 +92,10 @@ nav: - 7-cicd/preset-checks.md - 7-cicd/best-practices-prep-env.md - - Technical Concepts: - - 8-technical-concepts/state-file.md - - 8-technical-concepts/configuration.md + - Reference: + - 7-reference/configuration.md + - 7-reference/state-file.md + - 7-reference/cli-reference.md - Blog: "https://blog.reccehq.com" - Changelog: "https://reccehq.com/changelog/"