feat(aml): add lineage command for extracting metadata#17
Open
nvquanghuy wants to merge 6 commits intomasterfrom
Open
feat(aml): add lineage command for extracting metadata#17nvquanghuy wants to merge 6 commits intomasterfrom
nvquanghuy wants to merge 6 commits intomasterfrom
Conversation
Adds a new `holistics aml lineage` command that extracts lineage metadata from compiled AML projects and outputs a normalized JSON structure optimized for integration with data catalogs like DataHub. Features: - Parses TableModel and QueryModel entities with fields (dimensions/measures) - Extracts Dataset and Dashboard entities with chart definitions - Builds lineage edges: model->source, dataset->model, chart->model, dashboard->chart - Supports multiple table name formats (BigQuery, PostgreSQL, simple) - Options: --output file, --entities filter, --compact JSON Includes comprehensive tests with vitest. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds explicit chart→dataset relationship for proper hierarchy: - Dashboard → Chart → Dataset → Model → DW Table This complements chart_to_model (granular field-level lineage) with chart_to_dataset (hierarchical relationship). Also adds sample output fixture for reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add AQL parsing to lineage extraction to solve the understatement problem where chart dependencies hidden in calculations and metrics were missed. Changes: - Add extractAqlModelRefs() to parse model.field patterns from AQL strings - Add extractAqlStrings() to find Heredoc content in viz blocks - Update FieldReference type to include 'source' field (field_ref vs aql) - Update parseDataset() to extract metrics and their AQL dependencies - Add DatasetMetric type with models_referenced and fields_referenced This addresses Problems #2 and #3 from LINEAGE_CHALLENGES.md: - AQL expressions in chart calculations now traced - Dataset-level metrics now have their model dependencies extracted Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add CLI_CORE_PATH env var support for local development - Add type-checked AQL extraction using cli-core utilities - Fall back to regex-based extraction when utilities unavailable - Pass dataset context through parsing for accurate type resolution This enables more accurate model.field extraction from AQL expressions using the same type checker as the Holistics frontend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test regex fallback when cli-core not available - Test type-checked extraction with mock cli-core - Test dataset caching for multiple metrics - Test fallback on errors from extractAqlReferences - Test fallback on errors from createDatasetFromCompiled - Test chart AQL extraction with dataset context - Test charts without dataset reference (orphan charts) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add --compiled <file> option to `aml lineage` for testing with pre-compiled JSON instead of running the compile subprocess - Fix [dev] debug message to use console.error so it doesn't pollute stdout JSON output when the CLI is invoked as a subprocess Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
holistics aml lineage [path]command to extract lineage metadata from compiled AML projects@holistics/cli-coreAQL type checker to accurately extractmodel.fieldreferences from dataset metric definitions (instead of fragile regex)--output <file>,--entities <filter>,--compact,--compiled <file>Changes
extractAqlReferences()from cli-core when available, falls back to regexaml lineagesubcommand; adds--compiledflag for testing with pre-compiled JSONCLI_CORE_PATHenv var; debug log goes to stderr to avoid polluting JSON outputDependencies
extractAqlReferences()/createDatasetFromCompiled()exports from@holistics/cli-coreTest plan
pnpm test— all tests passCLI_CORE_PATH=/path/to/cli-core pnpm cli aml lineage /path/to/aml-project— AQL extraction works with local cli-corepnpm cli aml lineage --compiled compiled.json—--compiledflag loads pre-compiled JSONfields_referencedcorrectly lists model/field pairs from AQL expressions--output,--entities,--compactoptions🤖 Generated with Claude Code