Skip to content

update ppl doc examples to use otel data#5303

Draft
ritvibhatt wants to merge 3 commits intoopensearch-project:mainfrom
ritvibhatt:update-ppl-examples
Draft

update ppl doc examples to use otel data#5303
ritvibhatt wants to merge 3 commits intoopensearch-project:mainfrom
ritvibhatt:update-ppl-examples

Conversation

@ritvibhatt
Copy link
Copy Markdown
Contributor

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

PR Reviewer Guide 🔍

(Review updated until commit f0e3a6a)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add playground button support to docs exporter script

Relevant files:

  • scripts/docs_exporter/export_to_docs_website.py

Sub-PR theme: Update PPL documentation examples to use OTel log data

Relevant files:

  • doctest/test_data/otellogs.json
  • docs/user/ppl/cmd/search.md
  • docs/user/ppl/cmd/stats.md
  • docs/user/ppl/cmd/bin.md

⚡ Recommended focus areas for review

Corrupted Content

The stats.md file has severely corrupted content where query result table data has been accidentally embedded inline within the aggregation function list (e.g., VAR_SAMfetched rows / total rows..., STDDEV_SAMfetched rows..., DISTINCT_COUNT_Afetched rows...). This makes the documentation unreadable and the function list broken. The entire aggregation functions section needs to be reviewed and corrected.

* `VAR_SAMfetched rows / total rows = 5/5
+-----+-----------+--------------+
| p90 | sev_range | severityText |
|-----+-----------+--------------|
| 5   | 0         | DEBUG        |
| 9   | 0         | INFO         |
| 17  | 10        | ERROR        |
| 13  | 10        | WARN         |
| 21  | 20        | ERROR        |
+-----+-----------+--------------+` -- Sample variance
* `VAR_fetched rows / total rows = 1/1
+-----------------------------------------------------------------------------------------------------------------+
| all_levels                                                                                                      |
|-----------------------------------------------------------------------------------------------------------------|
| [INFO,INFO,WARN,ERROR,DEBUG,ERROR,INFO,ERROR,WARN,INFO,ERROR,DEBUG,WARN,INFO,ERROR,ERROR,INFO,WARN,DEBUG,ERROR] |
+-----------------------------------------------------------------------------------------------------------------+Ofetched rows / total rows = 3/3
+-----+-----------------------------------------------------------------------------+
| cnt | instrumentationScope.name                                                   |
|-----+-----------------------------------------------------------------------------|
| 2   | @opentelemetry/instrumentation-http                                         |
| 1   | Microsoft.Extensions.Hosting                                                |
| 1   | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
+-----+-----------------------------------------------------------------------------+` -- fetched rows / total rows = 1/1
+-------------------------+
| severity_levels         |
|-------------------------|
| [DEBUG,ERROR,INFO,WARN] |
+-------------------------+opulation variance
* `STDDEV_SAMfetched rows / total rows = 4/4
+-----+-----------+--------------+
| cnt | sev_range | severityText |
|-----+-----------+--------------|
| 3   | 0         | DEBUG        |
| 6   | 0         | INFO         |
| 7   | 10        | ERROR        |
| 4   | 10        | WARN         |
+-----+-----------+--------------+` -- Sample standard deviation
* `STDDEV_fetched rows / total rows = 5/5
+--------------------------------------------------+--------------+
| services                                         | severityText |
|--------------------------------------------------+--------------|
| [cart,product-catalog,cart]                      | DEBUG        |
| [payment,checkout,frontend-proxy]                | ERROR        |
| [payment,product-catalog]                        | ERROR        |
| [frontend,cart,frontend]                         | INFO         |
| [product-catalog,product-catalog,frontend-proxy] | WARN         |
+--------------------------------------------------+--------------+Ofetched rows / total rows = 1/1
+-----+
| p90 |
|-----|
| 21  |
+-----+` -- fetched rows / total rows = 5/5
+--------------------------------+--------------+
| percentile(severityNumber, 90) | severityText |
|--------------------------------+--------------|
| 5                              | DEBUG        |
| 17                             | ERROR        |
| 21                             | ERROR        |
| 9                              | INFO         |
| 13                             | WARN         |
+--------------------------------+--------------+opulation standard deviation
* `DISTINCT_COUNT_Afetched rows / total rows = 5/5
+-----+-----------+--------------+
| p90 | sev_range | severityText |
|-----+-----------+--------------|
| 5   | 0         | DEBUG        |
| 9   | 0         | INFO         |
| 17  | 10        | ERROR        |
| 13  | 10        | WARN         |
| 21  | 20        | ERROR        |
+-----+-----------+--------------+fetched rows / total rows = 1/1
+-----------------------------------------------------------------------------------------------------------------+
| all_levels                                                                                                      |
|-----------------------------------------------------------------------------------------------------------------|
| [INFO,INFO,WARN,ERROR,DEBUG,ERROR,INFO,ERROR,WARN,INFO,ERROR,DEBUG,WARN,INFO,ERROR,ERROR,INFO,WARN,DEBUG,ERROR] |
+-----------------------------------------------------------------------------------------------------------------+ROX` -- Approximate distinct count
* `TAKE` -- List of original values
* `fetched rows / total rows = 3/3
+-----+-----------------------------------------------------------------------------+
| cnt | instrumentationScope.name                                                   |
|-----+-----------------------------------------------------------------------------|
| 2   | @opentelemetry/instrumentation-http                                         |
| 1   | Microsoft.Extensions.Hosting                                                |
| 1   | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
+-----+-----------------------------------------------------------------------------+ERCENTILE`/`fetched rows / total rows = 1/1
+-------------------------+
| severity_levels         |
|-------------------------|
| [DEBUG,ERROR,INFO,WARN] |
+-------------------------+ERCENTILE_Afetched rows / total rows = 4/4
+-----+-----------+--------------+
| cnt | sev_range | severityText |
|-----+-----------+--------------|
| 3   | 0         | DEBUG        |
| 6   | 0         | INFO         |
| 7   | 10        | ERROR        |
| 4   | 10        | WARN         |
+-----+-----------+--------------+fetched rows / total rows = 5/5
+--------------------------------------------------+--------------+
| services                                         | severityText |
|--------------------------------------------------+--------------|
| [cart,product-catalog,cart]                      | DEBUG        |
| [payment,checkout,frontend-proxy]                | ERROR        |
| [payment,product-catalog]                        | ERROR        |
| [frontend,cart,frontend]                         | INFO         |
| [product-catalog,product-catalog,frontend-proxy] | WARN         |
+--------------------------------------------------+--------------+ROX` -- fetched rows / total rows = 1/1
+-----+
| p90 |
|-----|
| 21  |
+-----+ercentile calculations
* `fetched rows / total rows = 5/5
+--------------------------------+--------------+
| percentile(severityNumber, 90) | severityText |
|--------------------------------+--------------|
| 5                              | DEBUG        |
| 17                             | ERROR        |
| 21                             | ERROR        |
| 9                              | INFO         |
| 13                             | WARN         |
+--------------------------------+--------------+ERC<percent>`/`fetched rows / total rows = 5/5
+-----+-----------+--------------+
| p90 | sev_range | severityText |
|-----+-----------+--------------|
| 5   | 0         | DEBUG        |
| 9   | 0         | INFO         |
| 17  | 10        | ERROR        |
| 13  | 10        | WARN         |
| 21  | 20        | ERROR        |
+-----+-----------+--------------+<percent>` -- fetched rows / total rows = 1/1
+-----------------------------------------------------------------------------------------------------------------+
| all_levels                                                                                                      |
|-----------------------------------------------------------------------------------------------------------------|
| [INFO,INFO,WARN,ERROR,DEBUG,ERROR,INFO,ERROR,WARN,INFO,ERROR,DEBUG,WARN,INFO,ERROR,ERROR,INFO,WARN,DEBUG,ERROR] |
+-----------------------------------------------------------------------------------------------------------------+ercentile shortcut functions
* `MEDIAN` -- 50th percentile
* `EARLIEST` -- Earliest value by timestamp
* `LATEST` -- Latest value by timestamp
Corrupted Section Headers

Multiple section headers and inline text in stats.md are corrupted with embedded table output data. For example, lines like | cnt | year | DEfetched rows / total rows = 5/5 and Group by both yearly spans and department number (by default, null DEfetched rows... indicate that result table content was accidentally merged into prose and code blocks. The Example 12 section is particularly affected.

| Name  | DEfetched rows / total rows = 3/3
+-----+---------------------------+
| cnt | instrumentationScope.name |
|-----+---------------------------|
| 1   | opentelemetry-dotnet      |
| 1   | opentelemetry-go          |
| 2   | opentelemetry-js          |
+-----+---------------------------+TNO | birthday   |
|-------+--------+------------|
| Alice | 1      | 2024-04-21 |
| Bob   | 2      | 2025-08-21 |
| Jeff  | null   | 2025-04-22 |
| Adam  | 2      | null       |
+-------+--------+------------+

The following query groups data by yearly spans of the birthday field, automatically excluding null values:

source=example
| stats count() as cnt by span(birthday, 1y) as year

The query returns the following results:

fetched rows / total rows = 3/3
+-----+------------+
| cnt | year       |
|-----+------------|
| 1   | 2024-01-01 |
| 2   | 2025-01-01 |
+-----+------------+

Group by both yearly spans and department number (by default, null `DEfetched rows / total rows = 1/1
+-------------------------------+

severity_levels
[DEBUG,ERROR,ERROR,INFO,WARN]
+-------------------------------+TNO` values are included in the results):
source=example
| stats count() as cnt by span(birthday, 1y) as year, DEfetched rows / total rows = 5/5
+-----+-----------+--------------+
| cnt | sev_range | severityText |
|-----+-----------+--------------|
| 3   | 0         | DEBUG        |
| 6   | 0         | INFO         |
| 5   | 10        | ERROR        |
| 4   | 10        | WARN         |
| 2   | 20        | ERROR        |
+-----+-----------+--------------+TNO

The query returns the following results:

fetched rows / total rows = 3/3
+-----+------------+--------+
| cnt | year       | DEfetched rows / total rows = 5/5
+--------------------------------------------------+--------------+
| services                                         | severityText |
|--------------------------------------------------+--------------|
| [cart,product-catalog,cart]                      | DEBUG        |
| [payment,checkout,frontend-proxy]                | ERROR        |
| [payment,product-catalog]                        | ERROR        |
| [frontend,cart,frontend]                         | INFO         |
| [product-catalog,product-catalog,frontend-proxy] | WARN         |
+--------------------------------------------------+--------------+TNO |
|-----+------------+--------|
| 1   | 2024-01-01 | 1      |
| 1   | 2025-01-01 | 2      |
| 1   | 2025-01-01 | null   |
+-----+------------+--------+

Use bucket_nullable=false to exclude null `DEfetched rows / total rows = 1/1
+-----+

p90
21
+-----+TNO` values from the grouping:
source=example
| stats bucket_nullable=false count() as cnt by span(birthday, 1y) as year, DEfetched rows / total rows = 5/5
+--------------------------------+--------------+
| percentile(severityNumber, 90) | severityText |
|--------------------------------+--------------|
| 5                              | DEBUG        |
| 17                             | ERROR        |
| 21                             | ERROR        |
| 9                              | INFO         |
| 13                             | WARN         |
+--------------------------------+--------------+TNO

The query returns the following results:

fetched rows / total rows = 3/3
+-----+------------+--------+
| cnt | year       | DEfetched rows / total rows = 5/5
+-----+-----------+--------------+
| p90 | sev_range | severityText |
|-----+-----------+--------------|
| 5   | 0         | DEBUG        |
| 9   | 0         | INFO         |
| 17  | 10        | ERROR        |
| 13  | 10        | WARN         |
| 21  | 20        | ERROR        |
+-----+-----------+--------------+TNO |
|-----+------------+--------|
| 1   | 2024-01-01 | 1      |
| 1   | 2025-01-01 | 2      |
+-----+------------+--------+

</details>

<details><summary><a href='https://github.com/opensearch-project/sql/pull/5303/files#diff-dd984cb777515ecc5ea6aa39ef6df5a798c5eb668370ba32279b00f311be3d4cR754-R770'><strong>Misleading Example</strong></a>

The section "Each backslash in the search value must be escaped with another backslash. For example, the following query searches for Windows file paths by properly escaping backslashes" now uses a query `search severityText="WARN"` which has nothing to do with backslash escaping or Windows file paths. The example no longer matches its explanatory prose.
</summary>

```markdown
Each backslash in the search value must be escaped with another backslash. For example, the following query searches for Windows file paths by properly escaping backslashes:

```ppl
search severityText="WARN" source=otellogs
| sort `resource.attributes.service.name` | fields body | head 1

The query returns the following results:

fetched rows / total rows = 1/1
+--------------------------------------------------------+
| body                                                   |
|--------------------------------------------------------|
| SSL certificate for api.example.com expires in 14 days |
+--------------------------------------------------------+

</details>

<details><summary><a href='https://github.com/opensearch-project/sql/pull/5303/files#diff-dd984cb777515ecc5ea6aa39ef6df5a798c5eb668370ba32279b00f311be3d4cR339-R360'><strong>Inconsistent NOT Example</strong></a>

The NOT operator example claims "Logs without service metadata are included" but the result table shows the same three rows as the `!=` operator example (frontend, cart, product-catalog), which does not demonstrate the difference between `!=` and `NOT` (i.e., null inclusion). The example should show rows with null service names to illustrate the semantic difference.
</summary>

```markdown
**`NOT` operator**

Find all logs that do not specify `cert-monitor` as the service name (including those with null service name values):

```ppl
search NOT `resource.attributes.service.name`="cert-monitor" source=otellogs
| fields severityText, `resource.attributes.service.name`
| head 3

The query returns the following results. Logs without service metadata are included:

fetched rows / total rows = 3/3
+--------------+----------------------------------+
| severityText | resource.attributes.service.name |
|--------------+----------------------------------|
| INFO         | frontend                         |
| INFO         | cart                             |
| WARN         | product-catalog                  |
+--------------+----------------------------------+

</details>

<details><summary><a href='https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR259-R272'><strong>Regex Order Issue</strong></a>

The new PPL block regex (lines 259-269) converts `otellogs` blocks to `sql` fence with playground button, but the subsequent catch-all regex on line 272 (`re.sub(r'^```ppl\b.*$', '```sql', ...)`) would only apply to remaining PPL blocks. However, the first regex replaces the entire fenced block including the closing fence, while the second only replaces the opening fence line. If the first regex fails to match (e.g., due to edge cases), the second regex would leave the closing fence unchanged. This interaction should be validated carefully.
</summary>

```python
content = re.sub(
    r'^```ppl\b[^\n]*\n(.*?)^```$',
    lambda m: (
        '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
        if _is_playground_eligible(m.group(1), current_file_path)
        else (
            '```ppl\n' + m.group(1) + '```'
        )
    ),
    content, flags=re.MULTILINE | re.DOTALL,
)

# Convert remaining PPL code fences to SQL
content = re.sub(r'^```ppl\b.*$', '```sql', content, flags=re.MULTILINE)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

PR Code Suggestions ✨

Latest suggestions up to f0e3a6a

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix corrupted markdown document structure

The stats.md file has been corrupted — raw query result output has been injected
directly into the ## Parameters section header and the parameter table header,
breaking the document structure. The query result blocks appear to have been
incorrectly merged into the markdown prose. These sections need to be restored to
their original clean markdown headings and table structure.

docs/user/ppl/cmd/stats.md [18-36]

-## fetched rows / total rows = 1/1
-+-----+
-| p90 |
-|-----|
-| 21  |
-+-----+arameters
+## Parameters
 
 The `stats` command supports the following parameters.
 
-| fetched rows / total rows = 5/5
-...
-+--------------------------------+--------------+arameter | Required/Optional | Description |
+| Parameter | Required/Optional | Description |
Suggestion importance[1-10]: 9

__

Why: The stats.md file has raw query result output injected directly into the ## Parameters section header and the parameter table header, completely breaking the document structure. This is a critical documentation corruption issue that needs to be fixed.

High
Fix corrupted aggregation function list entries

The aggregation function list in stats.md has raw query result tables injected
mid-word throughout the bullet points (e.g., VAR_SAM

instead of VAR_SAMP). This
corrupts all the function name entries and their descriptions. The list needs to be
restored to clean bullet points with proper function names.

docs/user/ppl/cmd/stats.md [52-187]

-* `VAR_SAMfetched rows / total rows = 5/5
-...
-` -- Sample variance
-* `VAR_fetched rows / total rows = 1/1
-...
-Ofetched rows / total rows = 3/3
-...
-` -- fetched rows / total rows = 1/1
-...
-opulation variance
-* `STDDEV_SAMfetched rows / total rows = 4/4
-...
-` -- Sample standard deviation
-* `STDDEV_fetched rows / total rows = 5/5
-...
-Ofetched rows / total rows = 1/1
-...
-` -- fetched rows / total rows = 5/5
-...
-opulation standard deviation
+* `VAR_SAMP` -- Sample variance
+* `VAR_POP` -- Population variance
+* `STDDEV_SAMP` -- Sample standard deviation
+* `STDDEV_POP` -- Population standard deviation
+* `DISTINCT_COUNT_APPROX` -- Approximate distinct count
+* `TAKE` -- List of original values
+* `PERCENTILE`/`PERCENTILE_APPROX` -- Percentile calculations
+* `PERC<percent>`/`P<percent>` -- Percentile shortcut functions
Suggestion importance[1-10]: 9

__

Why: The aggregation function list has raw query result tables injected mid-word throughout the bullet points (e.g., VAR_SAM<table> instead of VAR_SAMP), corrupting all function name entries and descriptions. This is a critical documentation corruption that makes the content unreadable.

High
Fix corrupted Parameters section header

The ## Parameters section header and the parameter table header have been
accidentally replaced with query result blocks. This appears to be a copy-paste
error that corrupts the documentation structure, making the parameters section
unreadable.

docs/user/ppl/cmd/head.md [16-30]

-## fetched rows / total rows = 1/1
-+------------+
-| total_logs |
-|------------|
-| 20         |
-+------------+arameters
+## Parameters
 
 The `head` command supports the following parameters.
 
-| fetched rows / total rows = 1/1
-+--------------+
-| avg_severity |
-|--------------|
-| 12.0         |
-+--------------+arameter | Required/Optional | Description |
+| Parameter | Required/Optional | Description |
Suggestion importance[1-10]: 9

__

Why: The ## Parameters section header and the parameter table header have been accidentally replaced with query result blocks, making the documentation structurally broken and unreadable. This is a clear copy-paste error that needs to be fixed.

High
Fix duplicate queries with different intended behaviors

The two sub-examples in Example 3 use identical PPL queries (both use dedup
instrumentationScope.name without keepempty=true) and return identical results. The
first sub-example is supposed to demonstrate keepempty=true behavior (keeping null
values), while the second demonstrates the default behavior (dropping nulls). The
first query is missing keepempty=true and the result should show a null row.

docs/user/ppl/cmd/dedup.md [107-127]

-The following query deduplicates while ignoring documents with empty values in the specified field:
+The following query deduplicates while keeping documents with `null` values in the specified field:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

The query returns the following results:

-fetched rows / total rows = 3/3
+fetched rows / total rows = 4/4
+-----------------------------------------------------------------------------+
| instrumentationScope.name                                                   |
|-----------------------------------------------------------------------------|
+| null                                                                        |
| @opentelemetry/instrumentation-http                                         |
| Microsoft.Extensions.Hosting                                                |
| go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
+-----------------------------------------------------------------------------+
<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: Both sub-examples in Example 3 use identical PPL queries, but they are meant to demonstrate different behaviors (`keepempty=true` vs default). The first query is missing `keepempty=true` and the result should include a null row, making this a genuine documentation bug.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix filter value to match actual dataset service name</summary>

___


**The example filters for <code>resource.attributes.service.name = 'payment-service'</code>, but <br>the dataset uses <code>payment</code> (not <code>payment-service</code>) as the service name, resulting in 0 <br>rows returned. An example that intentionally returns no results is confusing for <br>documentation purposes and does not demonstrate the <code>AND</code> operator effectively. The <br>service name in the filter should match an actual value present in the dataset, such <br>as <code>payment</code>.**

[docs/user/ppl/cmd/where.md [50-69]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ea8ca7481f94d3d9401c4057d33afd640010be2df62b62730137101ef083c140R50-R69)

```diff
 ## Example 2: Filter using combined criteria
 
 The following query narrows down errors to a specific service during an incident investigation, combining severity and service name conditions with `AND`:
 
 ```ppl
 source=otellogs
-| where severityNumber >= 17 AND `resource.attributes.service.name` = 'payment-service'
+| where severityNumber >= 17 AND `resource.attributes.service.name` = 'payment'
 | sort severityNumber
 | fields severityText, severityNumber, `resource.attributes.service.name`

The query returns the following results:

-fetched rows / total rows = 0/0
+fetched rows / total rows = 2/2
+--------------+----------------+----------------------------------+
| severityText | severityNumber | resource.attributes.service.name |
|--------------+----------------+----------------------------------|
+| ERROR        | 17             | payment                          |
+| ERROR        | 17             | payment                          |
+--------------+----------------+----------------------------------+
<details><summary>Suggestion importance[1-10]: 6</summary>

__

Why: The filter uses `'payment-service'` but the dataset contains `'payment'`, causing 0 rows returned. An example returning empty results for a combined `AND` filter is misleading and doesn't demonstrate the feature effectively.


</details></details></td><td align=center>Low

</td></tr><tr><td rowspan=4>General</td>
<td>



<details><summary>Remove duplicate example with identical query and results</summary>

___


**Example 2 is a duplicate of Example 1 — both use the exact same PPL query and return <br>the exact same results. This provides no additional value to the reader. Replace <br>Example 2 with a distinct scenario, such as demonstrating the <code>replace</code> command with a <br>different field or showing how multiple replacements interact when both patterns <br>match.**

[docs/user/ppl/cmd/replace.md [52-78]](https://github.com/opensearch-project/sql/pull/5303/files#diff-8644862a762e96b24031ad83e31729768a40b3387fda49b9a3a4f0c6f4c3de39R52-R78)

```diff
-## Example 2: Replace service names for display
+## Example 2: Replace severity text labels for display
 
-The following query shortens service names for a compact dashboard view:
+The following query replaces verbose severity labels with shorter abbreviations for compact reporting:
 
 ```ppl
 source=otellogs
 | where severityText IN ('ERROR', 'WARN')
-| replace 'payment-service' with 'payment' in `resource.attributes.service.name`
-| replace 'inventory-service' with 'inventory' in `resource.attributes.service.name`
+| replace 'ERROR' with 'ERR' in severityText
+| replace 'WARN' with 'WRN' in severityText
 | sort severityNumber, `resource.attributes.service.name`
 | fields severityText, `resource.attributes.service.name`
 | head 4

-The query returns the following results:

-text -fetched rows / total rows = 4/4 -+--------------+----------------------------------+ -| severityText | resource.attributes.service.name | -|--------------+----------------------------------| -| WARN | frontend-proxy | -| WARN | frontend-proxy | -| WARN | product-catalog | -| WARN | product-catalog | -+--------------+----------------------------------+ -

<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: Example 2 in `replace.md` is an exact duplicate of Example 1 — same PPL query, same description, same results. This provides no value to readers and should be replaced with a distinct scenario.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fill in missing example content</summary>

___


**Example 3 has a heading but no content — the query, description, and expected output <br>are missing entirely. This leaves a broken section in the documentation that jumps <br>directly from an empty heading to Example 4.**

[docs/user/ppl/cmd/convert.md [78-80]](https://github.com/opensearch-project/sql/pull/5303/files#diff-76c639678d1e487015ecd16ab9353af28891a46b2d29760c877e41904ed29eedR78-R80)

```diff
 ## Example 3: Extract and convert memory sizes from error messages
+
+The following query extracts and converts memory size strings from error log messages to kilobytes:
+
+```ppl
+source=otellogs
+| where severityText = 'ERROR'
+| rex field=body "(?<mem_size>\d+[kmg]b?)"
+| convert memk(mem_size)
+| where NOT ISNULL(mem_size)
+| fields body, mem_size
+| head 3
+```
 
 ## Example 4: Multiple field conversions
Suggestion importance[1-10]: 6

__

Why: Example 3 has a heading but no content, jumping directly to Example 4. While the suggestion provides plausible content, the improved code is speculative and may not match the actual intended example for this PR.

Low
Ensure non-playground PPL blocks get copy buttons

The non-playground branch converts ppl blocks back to ppl fences without adding a
copy button or converting to sql. However, the subsequent re.sub call converts all
remaining ppl fences to sql without adding a copy button. Non-playground ppl blocks
should be converted to sql with a copy button in the first pass, or the else branch
should emit sql with a copy button directly, to avoid inconsistent output.

scripts/docs_exporter/export_to_docs_website.py [259-269]

 content = re.sub(
     r'^```ppl\b[^\n]*\n(.*?)^```$',
     lambda m: (
         '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
         if _is_playground_eligible(m.group(1), current_file_path)
         else (
-            '```ppl\n' + m.group(1) + '```'
+            '```sql\n' + m.group(1) + '```\n{% include copy.html %}'
         )
     ),
     content, flags=re.MULTILINE | re.DOTALL,
 )
Suggestion importance[1-10]: 5

__

Why: The non-playground else branch keeps ppl fences without a copy button, relying on the subsequent re.sub to convert them to sql, but that subsequent call also doesn't add copy buttons. Converting to sql with a copy button directly in the else branch would be more consistent and correct.

Low
Fix duplicate row in override example result table

The result table for Example 3 shows WARN appearing twice with the same agg value of
4, which is inconsistent with the described behavior of override=true replacing main
search values with subsearch values. The subsearch only covers ERROR and WARN, so
INFO and DEBUG rows from the main search should retain their original counts while
ERROR and WARN are overridden. The duplicated WARN row and the explanation should be
reviewed for accuracy.

docs/user/ppl/cmd/appendcol.md [94-104]

 ```text
 fetched rows / total rows = 4/4
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 7   | ERROR        |
-| 4   | WARN         |
 | 6   | INFO         |
 | 4   | WARN         |
 +-----+--------------+
<details><summary>Suggestion importance[1-10]: 5</summary>

__

Why: The result table shows `WARN` duplicated with the same `agg=4`, which is inconsistent with the described `override=true` behavior. The table should show all four severity levels (`DEBUG`, `ERROR`, `INFO`, `WARN`) with the subsearch overriding only `ERROR` and `WARN` values.


</details></details></td><td align=center>Low

</td></tr></tr></tbody></table>

___

#### Previous suggestions


<details><summary>Suggestions up to commit d0b2a08</summary>
<br>

</details>
<details><summary>Suggestions up to commit cc17d7e</summary>
<br><table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=3>Possible issue</td>
<td>



<details><summary>Fix duplicate queries with missing parameter</summary>

___


**Example 3 contains two identical PPL queries that are supposed to demonstrate <br>different behaviors (<code>keepempty=true</code> vs. default). The first query is missing <br><code>keepempty=true</code> and both queries produce the same result, making the distinction <br>between the two behaviors unclear. The first query should include <code>keepempty=true</code> to <br>demonstrate that null values are retained.**

[docs/user/ppl/cmd/dedup.md [86-126]](https://github.com/opensearch-project/sql/pull/5303/files#diff-532601ea92e852acbb8b9bcdd2e269377bc05a311ec141be38148539d20c5bafR86-R126)

```diff
 The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

The query returns the following results:

-fetched rows / total rows = 2/2
+fetched rows / total rows = 3/3
+---------------------------+
| instrumentationScope.name |
|---------------------------|
+| null                      |
| opentelemetry-java        |
| opentelemetry-python      |
+---------------------------+

The following query deduplicates while ignoring documents with empty values in the specified field:

source=otellogs
| dedup instrumentationScope.name
| fields instrumentationScope.name
| sort instrumentationScope.name
<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: The two PPL queries in Example 3 are identical, but they should demonstrate different behaviors. The first query is missing `keepempty=true`, which would show null values being retained, making the documentation misleading and incorrect.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix incorrect duplicate row in result table</summary>

___


**The result table for Example 3 (override parameter) is missing the <code>DEBUG</code> row and <br>shows <code>FATAL</code> duplicated, which appears inconsistent with the described behavior. The <br>main search produces 5 severity levels (DEBUG, ERROR, FATAL, INFO, WARN), but the <br>result only shows 5 rows with FATAL appearing twice and DEBUG missing. This is <br>likely a documentation error that could confuse readers about how <code>override=true</code> <br>works.**

[docs/user/ppl/cmd/appendcol.md [94-104]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R94-R104)

```diff
 ```text
 fetched rows / total rows = 5/5
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 5   | ERROR        |
-| 2   | FATAL        |
 | 2   | FATAL        |
 | 6   | INFO         |
 | 4   | WARN         |
 +-----+--------------+
<details><summary>Suggestion importance[1-10]: 6</summary>

__

Why: The result table shows `FATAL` duplicated and `DEBUG` missing, which appears inconsistent with the described `override=true` behavior. However, this could be intentional based on how the subsearch aligns rows positionally, so the actual correctness depends on the implementation behavior.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Fix inaccurate total row count in result</summary>

___


**The new data contains two documents matching "Payment failed" (one ERROR and one <br>FATAL), but the query claims <code>fetched rows / total rows = 1/1</code>. The <code>head 1</code> limits <br>output to one row, but the total count should reflect all matching rows (2/2). <br>Verify the expected total count in the result block is accurate.**

[docs/user/ppl/cmd/search.md [178]](https://github.com/opensearch-project/sql/pull/5303/files#diff-dd984cb777515ecc5ea6aa39ef6df5a798c5eb668370ba32279b00f311be3d4cR178-R178)

```diff
 search "Payment failed" source=otellogs
 | sort `resource.attributes.service.name` | fields body | head 1
+```
 
+The query returns the following results:
+
+```text
+fetched rows / total rows = 1/2
++---------------------------------------------------------------------+
+| body                                                                |
+|---------------------------------------------------------------------|
+| Payment failed: connection timeout to payment gateway after 30000ms |
++---------------------------------------------------------------------+
+
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern about the fetched rows / total rows = 1/1 count when head 1 is used, but the improved_code changes the format to 1/2 which may or may not be accurate depending on how the engine reports totals with head. The claim about two matching documents needs verification against the actual data.

Low
General
Remove duplicate documentation examples

Examples 1 and 2 are identical in both query and output, which provides no
additional value to the reader. Replace Example 2 with a distinct use case, such as
replacing a severity text value or normalizing a different field, to demonstrate the
command's versatility.

docs/user/ppl/cmd/replace.md [24-78]

-## Example 1: Normalize service names for a compact dashboard
+## Example 2: Normalize severity text labels
 
-The following query shortens long service names for a compact dashboard view:
-...
-## Example 2: Replace service names for display
+The following query replaces verbose severity labels with shorter codes for compact display:
 
-The following query shortens service names for a compact dashboard view:
-...
-| replace 'payment-service' with 'payment' in `resource.attributes.service.name`
-| replace 'inventory-service' with 'inventory' in `resource.attributes.service.name`
+```ppl
+source=otellogs
+| where severityText IN ('ERROR', 'FATAL')
+| replace 'ERROR' with 'ERR' in severityText
+| replace 'FATAL' with 'FTL' in severityText
 | sort severityNumber, `resource.attributes.service.name`
 | fields severityText, `resource.attributes.service.name`
 | head 4
+```
Suggestion importance[1-10]: 7

__

Why: Examples 1 and 2 in replace.md are completely identical in both query and output, which is a clear documentation error that provides no value. This is a valid and important observation that should be fixed to improve documentation quality.

Medium
Restore removed syntax reference sections

The PR removes the syntax reference sections ("Optional choices", "Repetition") that
explain PPL syntax conventions, replacing them entirely with examples. These
reference sections are important for users learning PPL syntax and should be
retained in a syntax reference document. The examples are a good addition but should
supplement, not replace, the syntax documentation.

docs/user/ppl/cmd/syntax.md [52-60]

+### Optional choices
+
+Optional choices between alternatives are shown in square brackets with pipe separators (`[option1 | option2]`). You can choose one of the options or omit them entirely.
+
+**Example**: `[asc | desc]` means you can specify `asc`, `desc`, or neither.
+
+### Repetition
+
+An ellipsis (`...`) indicates that the preceding element can be repeated multiple times.
+
+**Examples**:
+- `<field>...` means one or more fields without commas: `field1 field2 field3`
+- `<field>, ...` means comma-separated repetition: `field1, field2, field3`
+
 ## Example 1: Fetch data from an index with a filter
 
 The following query retrieves all error logs from the `otellogs` index. The filter is specified inline with the `source` command:
 
 ```ppl
 source=otellogs severityText = 'ERROR'
 | fields severityText, `resource.attributes.service.name`
 | sort `resource.attributes.service.name`
<details><summary>Suggestion importance[1-10]: 6</summary>

__

Why: The PR removes important syntax reference sections ("Optional choices" and "Repetition") from `syntax.md`, which are valuable for users learning PPL conventions. Restoring these sections alongside the new examples would make the documentation more complete and useful.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Add missing query results block</summary>

___


**The subsection "Matching with a prefix pattern" shows a query but is missing its <br>expected output block. Every other example in the file includes a results block, and <br>this omission leaves the documentation incomplete and inconsistent.**

[docs/user/ppl/cmd/where.md [104-113]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ea8ca7481f94d3d9401c4057d33afd640010be2df62b62730137101ef083c140R104-R113)

```diff
 ### Matching with a prefix pattern
 
 The following query uses a percent sign (`%`) to find all services in the `cart` domain, useful when investigating issues across a team's microservices:
 
 ```ppl
 source=otellogs
 | where LIKE(`resource.attributes.service.name`, 'cart-%')
 | sort severityNumber
 | fields severityText, `resource.attributes.service.name`, body

+The query returns the following results:
+
+text +fetched rows / total rows = 3/3 ++--------------+----------------------------------+----------------------------------------------------------------------------------------------+ +| severityText | resource.attributes.service.name | body | +|--------------+----------------------------------+----------------------------------------------------------------------------------------------| +| INFO | cart-service | Order #1234 placed successfully by user U100 | +| INFO | cart-service | Kafka consumer group rebalanced: 4 partitions assigned to consumer cart-consumer-01 | +| ERROR | cart-service | Kafka producer delivery failed: message too large for topic order-events (max 1048576 bytes) | ++--------------+----------------------------------+----------------------------------------------------------------------------------------------+ +
+

<details><summary>Suggestion importance[1-10]: 5</summary>

__

Why: The "Matching with a prefix pattern" subsection is missing its expected output block, making it inconsistent with all other examples in the file. However, the suggested output values cannot be fully verified from the PR diff alone, so the score is moderate.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Restore wildcard support documentation in parameters</summary>

___


**The PR removed the documentation about wildcard support in the <code>rename</code> command <br>parameters, but the syntax and behavior notes about wildcards were part of the <br>original documentation. If wildcard renaming is still a supported feature, removing <br>this information from the parameters table could mislead users into thinking it is <br>not supported.**

[docs/user/ppl/cmd/rename.md [10-22]](https://github.com/opensearch-project/sql/pull/5303/files#diff-12b43dbb9b9164dd4ac4501f6cf49f35d91cc51da8713b16d418bfa9659f0aacR10-R22)

```diff
 The `rename` command has the following syntax:
 
 ```syntax
 rename <source-field> AS <target-field> ["," <source-field> AS <target-field>]...

Parameters

Parameter Required/Optional Description
- <source-field> Required
- <target-field> Required
+ <source-field> Required
+ <target-field> Required
<details><summary>Suggestion importance[1-10]: 5</summary>

__

Why: The PR removed wildcard support documentation from the `rename` command parameters table. If wildcard renaming is still supported, this omission could mislead users. However, the PR may have intentionally removed this to simplify the documentation or because the feature is no longer supported.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Clarify transposed row count in description</summary>

___


**The result table for Example 1 shows 5 data columns (row 1 through row 5) but the <br>description says "fetched rows / total rows = 2/2", which refers to 2 rows being <br>transposed (log_count and severityText). However, the data has 5 severity levels <br>(DEBUG, ERROR, FATAL, INFO, WARN), so the transpose should produce 5 columns. The <br>header "fetched rows / total rows = 2/2" is misleading — it should reflect the <br>number of transposed output rows (2 field rows), but the column count (5) is <br>correct. The description text should clarify this is transposing 2 fields across 5 <br>rows of data, not 2 rows total.**

[docs/user/ppl/cmd/transpose.md [29-36]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ec1fe93388c8d21c3152db80ad223e173eeb0bfb9685a46b9d3ea389f31748b0R29-R36)

```diff
+```text
+fetched rows / total rows = 2/2
++--------------+-------+-------+-------+-------+-------+
+| column       | row 1 | row 2 | row 3 | row 4 | row 5 |
+|--------------+-------+-------+-------+-------+-------|
+| log_count    | 3     | 5     | 2     | 6     | 4     |
+| severityText | DEBUG | ERROR | FATAL | INFO  | WARN  |
++--------------+-------+-------+-------+-------+-------+
+```
 
-
Suggestion importance[1-10]: 2

__

Why: The existing_code and improved_code are identical, meaning no actual change is proposed. The suggestion only asks to clarify the description text but doesn't provide a concrete fix in the improved code.

Low

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f0e3a6a.

PathLineSeverityDescription
docs/user/ppl/cmd/stats.md18lowQuery result output is embedded directly inside section headings and aggregation function bullet points (e.g., '## fetched rows / total rows = 1/1 ... +arameters' and 'VAR_SAMfetched rows...+--...+arameter'). This corrupted content is anomalous and inconsistent with normal editing behavior, though it appears to be an accidental copy-paste error rather than deliberate injection.
docs/user/ppl/cmd/head.md15lowQuery result tables are embedded inside the '## Parameters' section heading and the parameter table header, producing malformed markdown. The same pattern as stats.md — anomalous but likely an accidental copy-paste error during the documentation rewrite.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Persistent review updated to latest commit d0b2a08

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Persistent review updated to latest commit f0e3a6a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant