Skip to content

[GLUTEN-10933][VL] feat: Support cached the batches in cpu cache#11758

Open
jinchengchenghh wants to merge 1 commit intoapache:mainfrom
jinchengchenghh:cudf_shuffle_up
Open

[GLUTEN-10933][VL] feat: Support cached the batches in cpu cache#11758
jinchengchenghh wants to merge 1 commit intoapache:mainfrom
jinchengchenghh:cudf_shuffle_up

Conversation

@jinchengchenghh
Copy link
Contributor

@jinchengchenghh jinchengchenghh commented Mar 13, 2026

Cache the batch in cpu cache, and wait for the join threads to fetch one by one, the build threads will start to fetch as soon as possible, but the probe thread need to wait for build finished.
The buffer size is controlled by spark.gluten.sql.columnar.backend.velox.cudf.shuffleMaxPrefetchBytes temporally, the size may be changed by the remaining memory in the server.

Test:
Test in local SF100, adjust the config to enable caching batch.

--conf spark.gluten.sql.columnar.backend.velox.cudf.batchSize=10000 \
--conf spark.gluten.sql.columnar.backend.velox.cudf.shuffleMaxPrefetchBytes=1024MB

The log prints Prefetched 171 batches (24057900 bytes) before blocking on GPU lock

Next step:
Prefetch the probe side batch when build starts.

Related issue: #10933

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant