Skip to content

Replace nested Stream.concat with direct iteration in expression merging#6725

Open
RanVaknin wants to merge 4 commits intomasterfrom
rvaknin/ddb-optimization-updateExpression-stream-flattening
Open

Replace nested Stream.concat with direct iteration in expression merging#6725
RanVaknin wants to merge 4 commits intomasterfrom
rvaknin/ddb-optimization-updateExpression-stream-flattening

Conversation

@RanVaknin
Copy link
Contributor

@RanVaknin RanVaknin commented Feb 10, 2026

Overview

This PR targets two bottlenecks in UpdateExpressionConverter.toExpression as part of the Update operation request pipeline. toExpression scales and becomes the most noticable bottleneck the bigger the input is: ~40% cpu time in SMALL datasets, and ~74% in HUGE_FLAT datasets.

Interactive Flamegraph

image

Testing

We have fairly good test coverage already: convert_mixedActions_uniqueNameTokens and convert_mixedActions verify the string output for correctness. This PR changes the underlying implementation but the API surface remains unchanged.


1. String.format overhead (~18% CPU time)

groupExpressions uses String.format to build the update expression for each attribute. Can replace with a StringJoiner which avoids the formatter's overhead.

image

2. Nested Stream.concat() in mergeExpressionNames() and mergeExpressionValues() (15-50% CPU time)

image

Current implementation:

private static Map<String, String> mergeExpressionNames(UpdateExpression expression) {
    return streamOfExpressionNames(expression)
        .reduce(Expression::joinNames)
        .orElseGet(Collections::emptyMap);
}

private static Stream<Map<String, String>> streamOfExpressionNames(UpdateExpression expression) {
    return Stream.concat(
        expression.setActions().stream().map(SetAction::expressionNames),
        Stream.concat(
            expression.removeActions().stream().map(RemoveAction::expressionNames),
            Stream.concat(
                expression.deleteActions().stream().map(DeleteAction::expressionNames),
                expression.addActions().stream().map(AddAction::expressionNames)
            )
        )
    );
}

The problem is that Stream.reduce() operation calls Expression.joinNames() for each action type (SET, ADD, REMOVE, DELETE). This method creates a new HashMap and copies all existing entries on every iteration:

public static Map<String, String> joinNames(Map<String, String> map1, Map<String, String> map2) {
    Map<String, String> result = new HashMap<>(map1);  // <- Allocate + copy all entries
    map2.forEach((key, value) -> {
        String oldValue = result.put(key, value);
        if (oldValue != null && !oldValue.equals(value)) {
            throw new IllegalArgumentException(...);
        }
    });
    return Collections.unmodifiableMap(result);
}

Example: With N attributes distributed across 4 action types, Stream.reduce() creates 3 intermediate HashMaps. For example, with 20 attributes (5 per action type), it copies 5 entries, then 10, then 15, before producing the final 20 entry result. The cost scales poorly with dataset size.

Instead we can iterate directly on each action and use a single HashMap:

private static Map<String, String> mergeExpressionNames(UpdateExpression expression) {
    Map<String, String> merged = new HashMap<>();
    
    for (SetAction action : expression.setActions()) {
        mergeNamesInto(merged, action.expressionNames());
    }
    for (RemoveAction action : expression.removeActions()) {
        mergeNamesInto(merged, action.expressionNames());
    }
    for (DeleteAction action : expression.deleteActions()) {
        mergeNamesInto(merged, action.expressionNames());
    }
    for (AddAction action : expression.addActions()) {
        mergeNamesInto(merged, action.expressionNames());
    }
    
    return merged.isEmpty() ? Collections.emptyMap() : Collections.unmodifiableMap(merged);
}

private static void mergeNamesInto(Map<String, String> target, Map<String, String> source) {
    if (source == null || source.isEmpty()) {
        return;
    }
    source.forEach((key, value) -> {
        String oldValue = target.get(key);
        if (oldValue != null && !oldValue.equals(value)) {
            throw new IllegalArgumentException(
                String.format("Attempt to coalesce two expressions with conflicting expression names. "
                            + "Expression name key = '%s'", key));
        }
        target.put(key, value);
    });
}

This eliminates intermediate HashMap allocations and theredundant entry copying. The same proposed optimization also applies to mergeExpressionValues().


Results:

toExpression went from ~40% CPU time to ~15%

  • groupExpressions went from 18% CPU time to ~2.5%
  • mergeExpressionValues and mergeExpressionNames went down from 18% to ~4%
image
Operation Size Master (μs) String Fix (μs/op) Δ +Stream flattening (μs/op) Total Δ
Update TINY 1.38 1.395 +1.09% 1.366 -1.01%
Update SMALL 9.174 8.62 -6.43% 7.202 -21.50%
Update HUGE 39.23 37.51 -4.38% 34.438 -12.22%
Update HUGE_FLAT 232.677 224.088 -3.69% 81.248 -65.08%

@RanVaknin RanVaknin force-pushed the rvaknin/ddb-optimization-updateExpression-stream-flattening branch from 2bf8d81 to e40e293 Compare February 24, 2026 18:55
@RanVaknin RanVaknin marked this pull request as ready for review February 24, 2026 18:55
@RanVaknin RanVaknin requested a review from a team as a code owner February 24, 2026 18:55
@RanVaknin RanVaknin added the no-api-surface-area-change Indicate there is no API surface area change and thus API surface area review is not required label Feb 24, 2026
@sonarqubecloud
Copy link

mergeValuesInto(merged, action.expressionValues());
}

return merged.isEmpty() ? Collections.emptyMap() : Collections.unmodifiableMap(merged);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Why not just returned Collections.unmodifiableMap(merged) unconditionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary but we can avoid wrapping an empty map with UnmodifiableMap this way.

Comment on lines +183 to +184
String.format("Attempt to coalesce two expressions with conflicting expression names. "
+ "Expression name key = '%s'", key));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this new behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I can see why you'd think that. It's a bit hard to read the diff.

In the current code streamOfExpressionNames calls joinNames() which has the same validation.

throw new IllegalArgumentException(
String.format("Attempt to coalesce two expressions with conflicting expression names. "
+ "Expression name key = '%s'", key));

In the new code I just moved the validation here.

We also have an existing test covering this specific codepath

void convert_removeActions_uniqueAttributes_duplicateNameTokens_error() {
UpdateExpression updateExpression = createUpdateExpression(
RemoveAction.builder().path("attribute_ref").putExpressionName("attribute_ref", "attribute1").build(),
RemoveAction.builder().path("attribute_ref").putExpressionName("attribute_ref", "attribute2").build());
assertThatThrownBy(() -> UpdateExpressionConverter.toExpression(updateExpression))
.isInstanceOf(IllegalArgumentException.class)
.hasMessageContaining("Attempt to coalesce two expressions with conflicting expression names")
.hasMessageContaining("attribute_ref");
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-api-surface-area-change Indicate there is no API surface area change and thus API surface area review is not required

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants