Skip to content

[BugFix] Add missing window function mappings for eventstats/streamstats#5305

Draft
songkant-aws wants to merge 2 commits intoopensearch-project:mainfrom
songkant-aws:bug-06a-window-func-mapping-5168
Draft

[BugFix] Add missing window function mappings for eventstats/streamstats#5305
songkant-aws wants to merge 2 commits intoopensearch-project:mainfrom
songkant-aws:bug-06a-window-func-mapping-5168

Conversation

@songkant-aws
Copy link
Copy Markdown
Collaborator

Description

row_number, rank, dense_rank, and nth_value were defined as enum values in BuiltinFunctionName but missing from WINDOW_FUNC_MAPPING, causing eventstats and streamstats to reject these valid window functions with "Unexpected window function" error.

Related Issues

Resolves #5168

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

row_number, rank, dense_rank, and nth_value were defined as enum values
in BuiltinFunctionName but missing from WINDOW_FUNC_MAPPING, causing
eventstats and streamstats to reject these valid window functions with
"Unexpected window function" error.

Resolves opensearch-project#5168

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Core fix - add missing window function mappings and execution support

Relevant files:

  • core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java

Sub-PR theme: Integration tests for window function fixes in eventstats and streamstats

Relevant files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLEventstatsIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteStreamstatsCommandIT.java
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5168.yml

⚡ Recommended focus areas for review

Missing WINDOW_FUNC_MAPPING

The entries for row_number, rank, dense_rank, nth, and nth_value are added to ALL_NATIVE_FUNCTIONS but the PR description mentions these were missing from WINDOW_FUNC_MAPPING. It should be verified that these entries are also properly added to WINDOW_FUNC_MAPPING (if it exists as a separate map), otherwise the fix may be incomplete or the mapping logic may differ from what is described.

.put("row_number", BuiltinFunctionName.ROW_NUMBER)
.put("rank", BuiltinFunctionName.RANK)
.put("dense_rank", BuiltinFunctionName.DENSE_RANK)
.put("nth", BuiltinFunctionName.NTH_VALUE)
.put("nth_value", BuiltinFunctionName.NTH_VALUE)
Null Return Path

When requiresWindowAggValidation returns false (for ROW_NUMBER, RANK, DENSE_RANK, NTH_VALUE), nodes is set to null. The subsequent return nodes != null ? PlanUtils.makeOver(...) : ... branch will then fall through to the else path. It should be verified that the else branch correctly handles these functions (e.g., calls PlanUtils.makeOver with no aggregation nodes), and that null is not inadvertently returned or causes a NullPointerException downstream.

requiresWindowAggValidation(functionName)
    ? PPLFuncImpTable.INSTANCE.validateAggFunctionSignature(
        functionName, field, args, context.rexBuilder)
    : null;
Incomplete Test Coverage

The YAML test file covers row_number, rank, and dense_rank but does not include a test case for nth_value (or its alias nth), which is also part of this bug fix. Consider adding a test case for nth_value to ensure full coverage.

"Issue 5168: eventstats row_number() should execute for grouped rows":
  - skip:
      features:
        - headers
  - do:
      headers:
        Content-Type: 'application/json'
      ppl:
        body:
          query: source=bounty-types | sort str_field | eventstats row_number() as rn by str_field | stats max(rn) as max_rn by str_field | sort str_field

  - match: { total: 2 }
  - match: { datarows: [ [ 1, alpha ], [ 1, beta ] ] }

---
"Issue 5168: streamstats rank() should execute for grouped rows":
  - skip:
      features:
        - headers
  - do:
      headers:
        Content-Type: 'application/json'
      ppl:
        body:
          query: source=bounty-types | streamstats rank() as rk by int_field | stats max(rk) as max_rk by int_field | sort int_field

  - match: { total: 3 }
  - match: { datarows: [ [ 1, -1 ], [ 1, 0 ], [ 1, 42 ] ] }

---
"Issue 5168: eventstats dense_rank() should execute for grouped rows":
  - skip:
      features:
        - headers
  - do:
      headers:
        Content-Type: 'application/json'
      ppl:
        body:
          query: source=bounty-types | sort str_field | eventstats dense_rank() as dr by str_field | stats max(dr) as max_dr by str_field | sort str_field

  - match: { total: 2 }
  - match: { datarows: [ [ 1, alpha ], [ 1, beta ] ] }
Weak Test Assertions

The testEventstatsRowNumberWindowFunction test expects rows(1, 1, "Canada") and rows(1, 1, "USA") for both min and max row numbers, suggesting only one row per country. If the test dataset has multiple rows per country, the max row_num should be greater than 1. The test data assumptions should be verified to ensure the assertions are meaningful and actually validate the window function behavior.

verifyDataRowsInOrder(actual, rows(1, 1, "Canada"), rows(1, 1, "USA"));

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect expected test result for row_number partition

The alpha group has two documents (_id: 1 and _id: 2), so row_number() partitioned
by str_field should produce values 1 and 2, making max(rn) equal to 2 for alpha, not
1. The expected value for alpha should be 2 to correctly reflect the window function
behavior over the partition.

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5168.yml [58]

-- match: { datarows: [ [ 1, alpha ], [ 1, beta ] ] }
+- match: { datarows: [ [ 2, alpha ], [ 1, beta ] ] }
Suggestion importance[1-10]: 8

__

Why: The alpha group has 2 documents, so row_number() partitioned by str_field should produce values 1 and 2, making max(rn) equal to 2 for alpha. The current expected value of [1, alpha] appears incorrect and would cause the test to fail or produce misleading results.

Medium
Fix incorrect exclusion of argument validation for NTH_VALUE

The NTH_VALUE case is incorrectly grouped with ROW_NUMBER, RANK, and DENSE_RANK as
not requiring validation. Unlike the ranking functions, NTH_VALUE takes arguments
(field and an index), and looking at PlanUtils.makeOver, it uses field and
argList.get(0) — meaning it does need the argument list validated. Skipping
validation for NTH_VALUE could lead to a NullPointerException or incorrect behavior
when argList is null or empty.

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java [611-616]

 private boolean requiresWindowAggValidation(BuiltinFunctionName functionName) {
   return switch (functionName) {
-    case ROW_NUMBER, RANK, DENSE_RANK, NTH_VALUE -> false;
+    case ROW_NUMBER, RANK, DENSE_RANK -> false;
     default -> true;
   };
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion raises a valid concern: NTH_VALUE requires field and argList.get(0) in PlanUtils.makeOver, so skipping validation could lead to null pointer issues. However, looking at the integration tests, nth(age, 2) works correctly, suggesting the validation bypass may be intentional or handled elsewhere. The concern about potential NPE is still worth addressing.

Medium

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] eventstats/streamstats reject window functions that grammar accepts

1 participant