Skip to content

feat(aml): add lineage command for extracting metadata#17

Open
nvquanghuy wants to merge 6 commits intomasterfrom
feature/lineage-command
Open

feat(aml): add lineage command for extracting metadata#17
nvquanghuy wants to merge 6 commits intomasterfrom
feature/lineage-command

Conversation

@nvquanghuy
Copy link
Copy Markdown

@nvquanghuy nvquanghuy commented Mar 31, 2026

Summary

  • Adds holistics aml lineage [path] command to extract lineage metadata from compiled AML projects
  • Outputs a normalized JSON structure optimized for integration with data catalogs (e.g., DataHub)
  • Parses models, datasets, dashboards, and charts with their relationships
  • Uses @holistics/cli-core AQL type checker to accurately extract model.field references from dataset metric definitions (instead of fragile regex)
  • Supports options: --output <file>, --entities <filter>, --compact, --compiled <file>

Changes

  • src/lineage.ts — lineage transformation logic; calls extractAqlReferences() from cli-core when available, falls back to regex
  • src/index.ts — adds aml lineage subcommand; adds --compiled flag for testing with pre-compiled JSON
  • src/loader.ts — local dev support via CLI_CORE_PATH env var; debug log goes to stderr to avoid polluting JSON output
  • src/tests/lineage.test.ts — comprehensive tests including AQL type-checked extraction
  • src/tests/fixtures/compiled-sample.json — test fixtures

Dependencies

  • Depends on holistics/holistics-core#2798 for extractAqlReferences() / createDatasetFromCompiled() exports from @holistics/cli-core

Test plan

  • pnpm test — all tests pass
  • CLI_CORE_PATH=/path/to/cli-core pnpm cli aml lineage /path/to/aml-project — AQL extraction works with local cli-core
  • pnpm cli aml lineage --compiled compiled.json--compiled flag loads pre-compiled JSON
  • Verify metric fields_referenced correctly lists model/field pairs from AQL expressions
  • Test --output, --entities, --compact options

🤖 Generated with Claude Code

Adds a new `holistics aml lineage` command that extracts lineage metadata
from compiled AML projects and outputs a normalized JSON structure optimized
for integration with data catalogs like DataHub.

Features:
- Parses TableModel and QueryModel entities with fields (dimensions/measures)
- Extracts Dataset and Dashboard entities with chart definitions
- Builds lineage edges: model->source, dataset->model, chart->model, dashboard->chart
- Supports multiple table name formats (BigQuery, PostgreSQL, simple)
- Options: --output file, --entities filter, --compact JSON

Includes comprehensive tests with vitest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nvquanghuy nvquanghuy requested a review from khanhhuy March 31, 2026 08:56
nvquanghuy and others added 5 commits April 1, 2026 09:53
Adds explicit chart→dataset relationship for proper hierarchy:
- Dashboard → Chart → Dataset → Model → DW Table

This complements chart_to_model (granular field-level lineage)
with chart_to_dataset (hierarchical relationship).

Also adds sample output fixture for reference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add AQL parsing to lineage extraction to solve the understatement problem
where chart dependencies hidden in calculations and metrics were missed.

Changes:
- Add extractAqlModelRefs() to parse model.field patterns from AQL strings
- Add extractAqlStrings() to find Heredoc content in viz blocks
- Update FieldReference type to include 'source' field (field_ref vs aql)
- Update parseDataset() to extract metrics and their AQL dependencies
- Add DatasetMetric type with models_referenced and fields_referenced

This addresses Problems #2 and #3 from LINEAGE_CHALLENGES.md:
- AQL expressions in chart calculations now traced
- Dataset-level metrics now have their model dependencies extracted

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add CLI_CORE_PATH env var support for local development
- Add type-checked AQL extraction using cli-core utilities
- Fall back to regex-based extraction when utilities unavailable
- Pass dataset context through parsing for accurate type resolution

This enables more accurate model.field extraction from AQL expressions
using the same type checker as the Holistics frontend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test regex fallback when cli-core not available
- Test type-checked extraction with mock cli-core
- Test dataset caching for multiple metrics
- Test fallback on errors from extractAqlReferences
- Test fallback on errors from createDatasetFromCompiled
- Test chart AQL extraction with dataset context
- Test charts without dataset reference (orphan charts)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add --compiled <file> option to `aml lineage` for testing with
  pre-compiled JSON instead of running the compile subprocess
- Fix [dev] debug message to use console.error so it doesn't pollute
  stdout JSON output when the CLI is invoked as a subprocess

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant