Spark 4.1: New Async Spark Micro Batch Planner by RjLi13 · Pull Request #15299 · apache/iceberg

RjLi13 · 2026-02-11T19:48:27Z

This is part 2 after splitting PR #15059

Part 1 PR is here: #15298.

This PR focuses on only introducing the new async spark micro batch planner and all changes to enable it.

Full context is in #15059 but posted below again:

Implements a new feature for Spark Structured Streaming and Iceberg users known as Async Spark Micro Batch Planner

Currently Microbatch planning in Iceberg is synchronous. Streaming queries plan out what batches to read and how many rows / files in each batch. Then it processes the data and repeats. By introducing an async planner, it improves streaming performance by pre-fetching table metadata and file scan tasks in a background thread, reducing micro-batch planning latency. This way planning can overlap with data processing and speed up dealing with large volumes.

This PR adds the option for users to set spark.sql.iceberg.async-micro-batch-planning-enabled if they want to use async planning. The code in SparkMicroBatchStream.java is moved to SyncSparkMicroBatchPlanner.java and SparkMicroBatchStream configures which planner to use. This option is defaulted to false, so existing behavior is unchanged.

This feature was originally authored by Drew Goya in our Netflix fork for Spark 3.3 & Iceberg 1.4. I built upon Drew's work by porting this to Spark 3.5 4.1 and current Iceberg version.

RjLi13 · 2026-02-11T19:50:54Z

Will put as ready for review when #15298 is merged. cc @bryanck

RjLi13 · 2026-02-15T06:47:50Z

Reposting this comment about benchmark here: #15059 (comment)

RjLi13 · 2026-02-27T20:14:10Z

@bryanck @singhpk234 any chance you were able to review this? Thanks in advance!

bryanck · 2026-03-11T16:15:01Z

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/AsyncSparkMicroBatchPlanner.java

+class AsyncSparkMicroBatchPlanner extends BaseSparkMicroBatchPlanner implements AutoCloseable {
+  private static final Logger LOG = LoggerFactory.getLogger(AsyncSparkMicroBatchPlanner.class);
+  private static final int PLAN_FILES_CACHE_MAX_SIZE = 10;
+  private static final long QUEUE_POLL_TIMEOUT_MS = 100L; // 100 ms


These could be configurable but it also is valuable not to add too many options.

Yup I was thinking there were a few knobs already based on what I documented, and these were less critical for users to tune.

bryanck · 2026-03-11T16:19:53Z

This LGTM! Thanks @RjLi13 for the contribution. @singhpk234 do you happen to have any feedback?

This feature was originally built by Drew Goya <dgoya@netflix.com> for Spark 3.3 and Iceberg 1.4.

RjLi13 · 2026-03-12T00:33:39Z

my bad I accidentally pulled instead of force pushed after rebase and brought in the commits from main into this PR. Fixed now, but got a bunch of labels, this only touches Spark.

And also Docs if that counts

RjLi13 · 2026-03-12T00:42:13Z

Also @bryanck thanks for the approval. Would you and @singhpk234 mind taking a look again as I updated two doc pages with the user facing config and also small blurb in the structured streaming section on this feature?

bryanck · 2026-03-12T14:39:53Z

The docs LGTM as well.

github-actions bot added the spark label Feb 11, 2026

RjLi13 mentioned this pull request Feb 11, 2026

Spark 4.1: Refactor SparkMicroBatchStream to SyncPlanner #15298

Merged

RjLi13 changed the title ~~Spark: New Async Spark Micro Batch Planner~~ Spark 4.1: New Async Spark Micro Batch Planner Feb 11, 2026

RjLi13 mentioned this pull request Feb 11, 2026

Spark: Async Spark Micro Batch Planner #15059

Closed

RjLi13 force-pushed the async-micro-batch-planner branch 2 times, most recently from 7815e12 to 64f07d6 Compare February 15, 2026 04:46

RjLi13 marked this pull request as ready for review February 15, 2026 04:49

bryanck requested review from bryanck and singhpk234 February 15, 2026 15:05

bryanck reviewed Mar 11, 2026

View reviewed changes

bryanck approved these changes Mar 11, 2026

View reviewed changes

github-actions bot added API parquet arrow core data flink MR ORC INFRA docs build hive Specification Issues that may introduce spec changes. OPENAPI KAFKACONNECT labels Mar 12, 2026

Ruijing Li added 3 commits March 11, 2026 17:29

Spark: Introduce Async Planner Feature and dedupe

24a9249

Spark: Credit original async micro-batch planner implementation

e2735ac

This feature was originally built by Drew Goya <dgoya@netflix.com> for Spark 3.3 and Iceberg 1.4.

Spark, Docs: Document new user facing options

98d0edf

RjLi13 force-pushed the async-micro-batch-planner branch from ea8c9ad to 98d0edf Compare March 12, 2026 00:31

github-actions bot removed API MR ORC INFRA OPENAPI KAFKACONNECT labels Mar 12, 2026

bryanck removed parquet arrow core data flink build hive Specification Issues that may introduce spec changes. labels Mar 12, 2026

Tighten the wording

5b9e1c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 4.1: New Async Spark Micro Batch Planner#15299

Spark 4.1: New Async Spark Micro Batch Planner#15299
RjLi13 wants to merge 4 commits intoapache:mainfrom
RjLi13:async-micro-batch-planner

RjLi13 commented Feb 11, 2026

Uh oh!

RjLi13 commented Feb 11, 2026

Uh oh!

RjLi13 commented Feb 15, 2026

Uh oh!

RjLi13 commented Feb 27, 2026

Uh oh!

bryanck Mar 11, 2026

Uh oh!

RjLi13 Mar 12, 2026

Uh oh!

bryanck commented Mar 11, 2026

Uh oh!

RjLi13 commented Mar 12, 2026 •

edited

Loading

Uh oh!

RjLi13 commented Mar 12, 2026

Uh oh!

bryanck commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RjLi13 commented Feb 11, 2026

Uh oh!

RjLi13 commented Feb 11, 2026

Uh oh!

RjLi13 commented Feb 15, 2026

Uh oh!

RjLi13 commented Feb 27, 2026

Uh oh!

bryanck Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

RjLi13 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

bryanck commented Mar 11, 2026

Uh oh!

RjLi13 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RjLi13 commented Mar 12, 2026

Uh oh!

bryanck commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RjLi13 commented Mar 12, 2026 •

edited

Loading