[TASK-331] Scan results returned per bucket python/cpp#351
[TASK-331] Scan results returned per bucket python/cpp#351luoyuxia merged 6 commits intoapache:mainfrom
Conversation
f1e20c7 to
995e1e6
Compare
|
@leekeiabstraction @zhaohaidao PTAL 🙏 |
There was a problem hiding this comment.
Pull request overview
This PR aligns Python and C++ bindings with Rust/Java by returning ScanRecords grouped by bucket instead of a flat list from LogScanner.poll(). This addresses issue #331 which requested per-bucket grouping for scan results.
Changes:
- Introduced new
ScanRecordsclass in Python and C++ that groups records by bucket - Removed
bucketfield from individualScanRecordobjects (bucket context now provided by the grouping) - Added dict-like and sequence protocols to Python
ScanRecordsfor flexible access patterns - Added
BucketViewclass and per-bucket methods to C++ScanRecords - Updated all examples, documentation, and tests to demonstrate the new API
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| bindings/python/src/table.rs | Implements new ScanRecords pyclass with per-bucket grouping, removes bucket field from ScanRecord, adds dict-like and sequence protocols |
| bindings/python/src/lib.rs | Exports new ScanRecords class to Python module |
| bindings/python/fluss/init.pyi | Adds type stubs for ScanRecords with overloaded __getitem__ for different access patterns |
| bindings/python/example/example.py | Demonstrates both flat iteration and per-bucket access patterns with the new API |
| bindings/python/test/test_log_table.py | Updates partitioned table test to verify per-bucket grouping and partition consistency |
| bindings/cpp/include/fluss.hpp | Adds TableBucket struct, BucketView class, updates ScanRecords with per-bucket methods, removes bucket/partition fields from ScanRecord |
| bindings/cpp/src/table.cpp | Implements per-bucket iteration, RecordAt(), Buckets(), Records(), BucketAt() methods, updates RowView to use bucket+record indices |
| bindings/cpp/src/lib.rs | Restructures FFI layer to store buckets in Vec with FfiBucketInfo metadata, updates all accessor methods to use bucket+record indexing |
| bindings/cpp/examples/example.cpp | Updates all scan operations to use per-bucket iteration, demonstrates both flat and per-bucket access patterns |
| website/docs/user-guide/python/api-reference.md | Documents new ScanRecords class with methods, indexing, and mapping protocol; removes bucket field from ScanRecord |
| website/docs/user-guide/python/example/log-tables.md | Shows both flat iteration and per-bucket access in examples |
| website/docs/user-guide/cpp/api-reference.md | Documents new ScanRecords per-bucket methods, BucketView, and TableBucket; updates ScanRecord documentation |
| website/docs/user-guide/cpp/example/log-tables.md | Demonstrates per-bucket iteration pattern |
| website/docs/user-guide/rust/example/log-tables.md | Adds example showing per-bucket access for consistency with other bindings |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
995e1e6 to
737b07f
Compare
|
Rebased, addressed comments |
leekeiabstraction
left a comment
There was a problem hiding this comment.
TY for the PR! Left some small comments
|
@leekeiabstraction Ty for the review |
2c2d250 to
10edaf7
Compare
leekeiabstraction
left a comment
There was a problem hiding this comment.
TY for the PR! LGTM!
|
I changes some C++ API to match between clients:
|
|
we may wish to merge #353 first, it would simplify a couple of things here |
|
@fresh-borzoni Thanks for the pr. #353 is merged.. |
10edaf7 to
8029836
Compare
|
@luoyuxia Thank you! rebased, updated |
|
@leekeiabstraction Can you look through one more time, pls? |
8029836 to
82a5ced
Compare
82a5ced to
2e41f23
Compare
leekeiabstraction
left a comment
There was a problem hiding this comment.
Looks good to me, however as both languages are not my forte. Would suggest an extra pair of eyes reviewing
luoyuxia
left a comment
There was a problem hiding this comment.
@fresh-borzoni Thanks for the pr. Took a quick look, LGTM overall.
Summary
closes #331
Per-Bucket Grouped
ScanRecordsfor Python and C++ BindingsAligns Python and C++
LogScanner.poll()with Rust/Java by returningScanRecordsgrouped by bucket instead of a flat list.Changes
Python
ScanRecordspyclass with per-bucket access (buckets(),records(bucket),items(),values(),keys())scan_records[bucket],bucket in scan_recordsscan_records[0],scan_records[0:3],len(scan_records)for record in scan_recordsC++
TableBucketstruct andBucketViewclassScanRecords:BucketCount(),Buckets(),Records(bucket),BucketAt(idx)begin()/end()unchangedDocs & Examples