Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# This should match the owning team set up in https://github.com/orgs/opensearch-project/teams
* @ps48 @kavithacm @derek-ho @joshuali925 @dai-chen @YANG-DB @mengweieric @vamsimanohar @swiddis @penghuo @seankao-az @MaxKsyunz @Yury-Fridlyand @anirudha @forestmvey @acarbonetto @GumpacG @ykmr1224 @LantaoJin @noCharger @qianheng-aws @yuancu @RyanL1997 @ahkcs
* @ps48 @joshuali925 @dai-chen @mengweieric @vamsimanohar @swiddis @penghuo @anirudha @acarbonetto @ykmr1224 @LantaoJin @noCharger @qianheng-aws @yuancu @RyanL1997 @ahkcs @songkant-aws
1 change: 1 addition & 0 deletions .github/workflows/stalled.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,4 @@ jobs:
days-before-pr-close: -1
days-before-issue-close: -1
exempt-draft-pr: true
exempt-pr-labels: 'no-stall'
159 changes: 159 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

OpenSearch SQL plugin — enables SQL and PPL (Piped Processing Language) queries against OpenSearch. This is a multi-module Gradle project (Java 21) that functions as an OpenSearch plugin.

## Build Commands

```bash
# Full build (compiles, tests, checks)
./gradlew build

# Fast build (skip integration tests)
./gradlew build -x integTest

# Build specific module
./gradlew :core:build
./gradlew :sql:build
./gradlew :ppl:build

# Run unit tests only
./gradlew test

# Run a single unit test class
./gradlew :core:test --tests "org.opensearch.sql.analysis.AnalyzerTest"

# Run integration tests
./gradlew :integ-test:integTest

# Run a single integration test
./gradlew :integ-test:integTest -Dtests.class="*QueryIT"

# Skip Prometheus if unavailable
./gradlew :integ-test:integTest -DignorePrometheus

# Code formatting
./gradlew spotlessCheck # Check
./gradlew spotlessApply # Auto-fix

# Regenerate ANTLR parsers from grammar files
./gradlew generateGrammarSource

# Run plugin locally with OpenSearch
./gradlew :opensearch-sql-plugin:run
./gradlew :opensearch-sql-plugin:run -DdebugJVM # With remote debug on port 5005

# Run doctests
./gradlew :doctest:doctest
./gradlew :doctest:doctest -Pdocs=search # Single file
```

## Code Style

- **Google Java Format** enforced via Spotless (2-space indent, 100 char line limit)
- **Lombok** is used throughout — `@Getter`, `@Builder`, `@RequiredArgsConstructor`, etc.
- **License header** required on all Java files (Apache 2.0). Missing headers fail the build.
- Pre-commit hooks run `spotlessApply` automatically
- All commits must include a DCO sign-off: `Signed-off-by: Name <email>` (use `git commit -s`).

## Architecture

### Query Pipeline

```
User Query (SQL/PPL)
→ Parsing (ANTLR) — produces parse tree
→ AST Construction (AstBuilder visitor) — produces UnresolvedPlan
→ Semantic Analysis (Analyzer) — resolves symbols/types → LogicalPlan
→ Planning (Planner + LogicalPlanOptimizer) — produces PhysicalPlan
→ Execution (ExecutionEngine) — streams ExprValue results
→ Response Formatting (ResponseFormatter — JSON/CSV/JDBC)
```

### Module Dependency Graph

```
plugin (OpenSearch plugin entry point, Guice DI wiring)
├── sql — SQL parsing (ANTLR → AST via SQLSyntaxParser/AstBuilder)
├── ppl — PPL parsing (ANTLR → AST via PPLSyntaxParser/AstBuilder)
├── core — Central module: Analyzer, Planner, ExecutionEngine interfaces,
│ AST/LogicalPlan/PhysicalPlan node types, expression system, type system
├── opensearch — OpenSearch storage engine, execution engine, client
├── protocol — Response formatters (JSON, CSV, JDBC, YAML)
├── common — Shared settings and utilities
├── legacy — V1 SQL engine (backward compatibility fallback)
├── datasources — Multi-datasource support (Glue, Security Lake, Prometheus)
├── async-query / async-query-core — Spark-based async query execution
├── direct-query / direct-query-core — Direct external datasource queries
└── language-grammar — Centralized ANTLR .g4 grammar files
```

`core` has no dependency on other modules. `sql` and `ppl` depend on `core` and `language-grammar`. `opensearch` implements `core` interfaces.

### Key Source Locations

| Area | Key Files |
|------|-----------|
| Plugin entry | `plugin/.../SQLPlugin.java`, `plugin/.../OpenSearchPluginModule.java` |
| SQL parsing | `sql/.../sql/parser/AstBuilder.java`, `sql/.../SQLService.java` |
| PPL parsing | `ppl/.../ppl/parser/AstBuilder.java`, `ppl/.../PPLService.java` |
| ANTLR grammars | `language-grammar/src/main/antlr4/` (OpenSearchSQLParser.g4, OpenSearchPPLParser.g4) |
| Analysis | `core/.../analysis/Analyzer.java`, `core/.../analysis/ExpressionAnalyzer.java` |
| Planning | `core/.../planner/Planner.java`, `core/.../planner/logical/LogicalPlan.java` |
| Execution | `core/.../executor/ExecutionEngine.java`, `opensearch/.../OpenSearchExecutionEngine.java` |
| Storage | `opensearch/.../storage/OpenSearchStorageEngine.java` |
| Query orchestration | `core/.../executor/QueryService.java`, `core/.../executor/QueryPlanFactory.java` |

### Core Abstractions

- **`Node<T>`** — Base AST node with visitor pattern support
- **`UnresolvedPlan`** / **`LogicalPlan`** / **`PhysicalPlan`** — Query plan hierarchy (unresolved → logical → physical)
- **`Expression`** — Resolved expression with `valueOf()` and `type()`
- **`ExprValue`** — Runtime value types (ExprIntegerValue, ExprStringValue, etc.)
- **`ExprType`** — Type system (DATE, TIMESTAMP, DOUBLE, STRUCT, etc.)
- **`StorageEngine`** / **`Table`** — Pluggable storage abstraction
- **`ExecutionEngine`** — Executes physical plans, returns QueryResponse

### Design Patterns

- **Visitor pattern** used pervasively: `AbstractNodeVisitor`, `LogicalPlanNodeVisitor`, `PhysicalPlanNodeVisitor`, `ExpressionNodeVisitor`
- **PhysicalPlan** implements `Iterator<ExprValue>` for streaming execution
- **Guice** dependency injection in `OpenSearchPluginModule`

## Adding New PPL Commands

Follow the checklist in `docs/dev/ppl-commands.md`:
1. Update lexer/parser grammars (OpenSearchPPLLexer.g4, OpenSearchPPLParser.g4)
2. Add AST node under `org.opensearch.sql.ast.tree`
3. Add `visit*` method in `AbstractNodeVisitor`, override in `Analyzer`, `CalciteRelNodeVisitor`, `PPLQueryDataAnonymizer`
4. Unit tests extending `CalcitePPLAbstractTest` (include `verifyLogical()` and `verifyPPLToSparkSQL()`)
5. Integration tests extending `PPLIntegTestCase`
6. Add user docs under `docs/user/ppl/cmd/`

## Adding New PPL Functions

Follow `docs/dev/ppl-functions.md`. Three approaches:
1. Reuse existing Calcite operators from `SqlStdOperatorTable`/`SqlLibraryOperators`
2. Adapt static Java methods via `UserDefinedFunctionUtils.adapt*ToUDF`
3. Implement `ImplementorUDF` interface from scratch, register in `PPLBuiltinOperators`

## Calcite Engine

The project has two execution engines: the legacy **v2 engine** and the newer **Calcite engine** (Apache Calcite-based). Calcite is toggled via `plugins.calcite.enabled` setting (default: off in production, toggled per-test in integration tests).

- In integration tests, call `enableCalcite()` in `init()` to activate the Calcite path
- Some features (e.g., graphLookup) require pushdown optimization — use `enabledOnlyWhenPushdownIsEnabled()` to skip tests in the `CalciteNoPushdownIT` suite
- `CalciteNoPushdownIT` is a JUnit `@Suite` that re-runs Calcite test classes with pushdown disabled; add new test classes to its `@Suite.SuiteClasses` list

## Integration Tests

Located in `integ-test/src/test/java/`. Organized by area: `sql/`, `ppl/`, `calcite/`, `legacy/`, `jdbc/`, `datasource/`, `asyncquery/`, `security/`. Uses OpenSearch test framework (in-memory cluster per test class). YAML REST tests in `integ-test/src/yamlRestTest/resources/rest-api-spec/test/`.

Key base classes:
- `PPLIntegTestCase` — base for PPL integration tests (v2 engine)
- `CalcitePPLIT` — base for Calcite PPL integration tests (calls `enableCalcite()`)
- `CalcitePPLAbstractTest` — base for Calcite PPL unit tests (`verifyLogical()`, `verifyPPLToSparkSQL()`)
- `CalciteExplainIT` — explain plan tests using YAML expected output files in `integ-test/src/test/resources/expectedOutput/calcite/`
38 changes: 19 additions & 19 deletions MAINTAINERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,10 @@ This document contains a list of maintainers in this repo. See [opensearch-proje
| Eric Wei | [mengweieric](https://github.com/mengweieric) | Amazon |
| Joshua Li | [joshuali925](https://github.com/joshuali925) | Amazon |
| Shenoy Pratik | [ps48](https://github.com/ps48) | Amazon |
| Kavitha Mohan | [kavithacm](https://github.com/kavithacm) | Amazon |
| Derek Ho | [derek-ho](https://github.com/derek-ho) | Amazon |
| Lior Perry | [YANG-DB](https://github.com/YANG-DB) | Amazon |
| Simeon Widdis | [swiddis](https://github.com/swiddis) | Amazon |
| Chen Dai | [dai-chen](https://github.com/dai-chen) | Amazon |
| Vamsi Manohar | [vamsimanohar](https://github.com/vamsimanohar) | Amazon |
| Peng Huo | [penghuo](https://github.com/penghuo) | Amazon |
| Sean Kao | [seankao-az](https://github.com/seankao-az) | Amazon |
| Anirudha Jadhav | [anirudha](https://github.com/anirudha) | Amazon |
| Tomoyuki Morita | [ykmr1224](https://github.com/ykmr1224) | Amazon |
| Lantao Jin | [LantaoJin](https://github.com/LantaoJin) | Amazon |
Expand All @@ -25,22 +21,26 @@ This document contains a list of maintainers in this repo. See [opensearch-proje
| Yuanchun Shen | [yuancu](https://github.com/yuancu) | Amazon |
| Ryan Liang | [RyanL1997](https://github.com/RyanL1997) | Amazon |
| Kai Huang | [ahkcs](https://github.com/ahkcs) | Amazon |
| Max Ksyunz | [MaxKsyunz](https://github.com/MaxKsyunz) | Improving |
| Yury Fridlyand | [Yury-Fridlyand](https://github.com/Yury-Fridlyand) | Improving |
| Songkan Tang | [songkant-aws](https://github.com/songkant-aws) | Amazon |
| Andrew Carbonetto | [acarbonetto](https://github.com/acarbonetto) | Improving |
| Forest Vey | [forestmvey](https://github.com/forestmvey) | Improving |
| Guian Gumpac | [GumpacG](https://github.com/GumpacG) | Improving |

## Emeritus Maintainers

| Maintainer | GitHub ID | Affiliation |
| ----------------- | ------------------------------------------------------- | ----------- |
| Charlotte Henkle | [CEHENKLE](https://github.com/CEHENKLE) | Amazon |
| Nick Knize | [nknize](https://github.com/nknize) | Amazon |
| David Cui | [davidcui1225](https://github.com/davidcui1225) | Amazon |
| Eugene Lee | [eugenesk24](https://github.com/eugenesk24) | Amazon |
| Zhongnan Su | [zhongnansu](https://github.com/zhongnansu) | Amazon |
| Chloe Zhang | [chloe-zh](https://github.com/chloe-zh) | Amazon |
| Peter Fitzgibbons | [pjfitzgibbons](https://github.com/pjfitzgibbons) | Amazon |
| Rupal Mahajan | [rupal-bq](https://github.com/rupal-bq) | Amazon |

| Maintainer | GitHub ID |
| ----------------- | ------------------------------------------------------- |
| Charlotte Henkle | [CEHENKLE](https://github.com/CEHENKLE) |
| Nick Knize | [nknize](https://github.com/nknize) |
| David Cui | [davidcui1225](https://github.com/davidcui1225) |
| Eugene Lee | [eugenesk24](https://github.com/eugenesk24) |
| Zhongnan Su | [zhongnansu](https://github.com/zhongnansu) |
| Chloe Zhang | [chloe-zh](https://github.com/chloe-zh) |
| Peter Fitzgibbons | [pjfitzgibbons](https://github.com/pjfitzgibbons) |
| Rupal Mahajan | [rupal-bq](https://github.com/rupal-bq) |
| Kavitha Mohan | [kavithacm](https://github.com/kavithacm) |
| Derek Ho | [derek-ho](https://github.com/derek-ho) |
| Lior Perry | [YANG-DB](https://github.com/YANG-DB) |
| Sean Kao | [seankao-az](https://github.com/seankao-az) |
| Max Ksyunz | [MaxKsyunz](https://github.com/MaxKsyunz) |
| Yury Fridlyand | [Yury-Fridlyand](https://github.com/Yury-Fridlyand) |
| Forest Vey | [forestmvey](https://github.com/forestmvey) |
| Guian Gumpac | [GumpacG](https://github.com/GumpacG) |
80 changes: 75 additions & 5 deletions api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ This module provides components organized into two main areas aligned with the [

### Unified Language Specification

- **`UnifiedQueryPlanner`**: Accepts PPL (Piped Processing Language) queries and returns Calcite `RelNode` logical plans as intermediate representation.
- **`UnifiedQueryParser`**: Parses PPL (Piped Processing Language) or SQL queries and returns the native parse result (`UnresolvedPlan` for PPL, `SqlNode` for Calcite SQL).
- **`UnifiedQueryPlanner`**: Accepts PPL or SQL queries and returns Calcite `RelNode` logical plans as intermediate representation.
- **`UnifiedQueryTranspiler`**: Converts Calcite logical plans (`RelNode`) into SQL strings for various target databases using different SQL dialects.

### Unified Execution Runtime
Expand All @@ -17,7 +18,7 @@ This module provides components organized into two main areas aligned with the [
- **`UnifiedFunction`**: Engine-agnostic function interface that enables functions to be evaluated across different execution engines without engine-specific code duplication.
- **`UnifiedFunctionRepository`**: Repository for discovering and loading functions as `UnifiedFunction` instances, providing a bridge between function definitions and external execution engines.

Together, these components enable complete workflows: parse PPL queries into logical plans, transpile those plans into target database SQL, compile and execute queries directly, or export PPL functions for use in external execution engines.
Together, these components enable complete workflows: parse PPL or SQL queries into logical plans, transpile those plans into target database SQL, compile and execute queries directly, or export PPL functions for use in external execution engines.

### Experimental API Design

Expand All @@ -33,7 +34,7 @@ Create a context with catalog configuration, query type, and optional settings:

```java
UnifiedQueryContext context = UnifiedQueryContext.builder()
.language(QueryType.PPL)
.language(QueryType.PPL) // or QueryType.SQL for SQL
.catalog("opensearch", opensearchSchema)
.catalog("spark_catalog", sparkSchema)
.defaultNamespace("opensearch")
Expand All @@ -42,9 +43,23 @@ UnifiedQueryContext context = UnifiedQueryContext.builder()
.build();
```

### UnifiedQueryParser

Use `UnifiedQueryParser` to parse queries into their native parse tree. The parser is owned by `UnifiedQueryContext` and returns the native parse result for each language.

```java
// PPL parsing
UnresolvedPlan ast = (UnresolvedPlan) context.getParser().parse("source = logs | where status = 200");

// SQL parsing (with QueryType.SQL context)
SqlNode sqlNode = (SqlNode) sqlContext.getParser().parse("SELECT * FROM logs WHERE status = 200");
```

Callers can then use each language's native visitor infrastructure (`AbstractNodeVisitor` for PPL, `SqlBasicVisitor` for Calcite SQL) on the typed result for further analysis.

### UnifiedQueryPlanner

Use `UnifiedQueryPlanner` to parse and analyze PPL queries into Calcite logical plans. The planner accepts a `UnifiedQueryContext` and can be reused for multiple queries.
Use `UnifiedQueryPlanner` to parse and analyze PPL or SQL queries into Calcite logical plans. The planner accepts a `UnifiedQueryContext` and can be reused for multiple queries.

```java
// Create planner with context
Expand All @@ -53,6 +68,9 @@ UnifiedQueryPlanner planner = new UnifiedQueryPlanner(context);
// Plan multiple queries (context is reused)
RelNode plan1 = planner.plan("source = logs | where status = 200");
RelNode plan2 = planner.plan("source = metrics | stats avg(cpu)");

// SQL queries are also supported (with QueryType.SQL context)
RelNode plan3 = planner.plan("SELECT * FROM logs WHERE status = 200");
```

### UnifiedQueryTranspiler
Expand Down Expand Up @@ -176,6 +194,59 @@ try (UnifiedQueryContext context = UnifiedQueryContext.builder()
}
```

## Profiling

The unified query API supports the same [profiling capability](../docs/user/ppl/interfaces/endpoint.md#profile-experimental) as the PPL REST endpoint. When enabled, each unified query component automatically collects per-phase timing metrics. For code outside unified query components (e.g., `PreparedStatement.executeQuery()` or response formatting), `context.measure()` records custom phases into the same profile.

```java
try (UnifiedQueryContext context = UnifiedQueryContext.builder()
.language(QueryType.PPL)
.catalog("catalog", schema)
.defaultNamespace("catalog")
.profiling(true)
.build()) {

// Auto-profiled: ANALYZE
RelNode plan = new UnifiedQueryPlanner(context).plan(query);

// Auto-profiled: OPTIMIZE
PreparedStatement stmt = new UnifiedQueryCompiler(context).compile(plan);

// User-profiled via measure()
ResultSet rs = context.measure(MetricName.EXECUTE, stmt::executeQuery);
String json = context.measure(MetricName.FORMAT, () -> formatter.format(result));

// Retrieve profile snapshot
QueryProfile profile = context.getProfile();
}
```

The returned `QueryProfile` follows the same JSON structure as the REST API:

```json
{
"summary": {
"total_time_ms": 33.34
},
"phases": {
"analyze": { "time_ms": 8.68 },
"optimize": { "time_ms": 18.2 },
"execute": { "time_ms": 4.87 },
"format": { "time_ms": 0.05 }
},
"plan": {
"node": "EnumerableCalc",
"time_ms": 4.82,
"rows": 2,
"children": [
{ "node": "CalciteEnumerableIndexScan", "time_ms": 4.12, "rows": 2 }
]
}
}
```

When profiling is disabled (the default), all components execute with zero overhead.

## Development & Testing

A set of unit tests is provided to validate planner behavior.
Expand Down Expand Up @@ -226,5 +297,4 @@ public class MySchema extends AbstractSchema {

## Future Work

- Expand support to SQL language.
- Extend planner to generate optimized physical plans using Calcite's optimization frameworks.
Loading
Loading