Skip to content

[SPARK-56007][CONNECT] Fix ArrowDeserializer to use positional binding for rows#54832

Open
hvanhovell wants to merge 4 commits intoapache:masterfrom
hvanhovell:SPARK-56007
Open

[SPARK-56007][CONNECT] Fix ArrowDeserializer to use positional binding for rows#54832
hvanhovell wants to merge 4 commits intoapache:masterfrom
hvanhovell:SPARK-56007

Conversation

@hvanhovell
Copy link
Contributor

What changes were proposed in this pull request?

This PR switches RowEncoder deserialization in the Spark Connect Scala client from name-based lookup to positional binding to correctly handle duplicate column names.

Why are the changes needed?

The Spark Connect Scala client can't handle with rows with duplicate column names. This is regression w.r.t. classic.

Does this PR introduce any user-facing change?

Yes. It fixes a bug.

How was this patch tested?

I added tests to ArrowEncoderSuite.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code v2.1.76

…g for RowEncoder and validate schema

Switch RowEncoder deserialization from name-based lookup to positional binding to correctly handle
duplicate column names. Add field-count and field-name mismatch error conditions with new tests.

Co-authored-by: Isaac
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM cc @zhengruifeng

Fix the `bind to schema` test:
- Correct `wideSchemaEncoder` (remove stray `a: int` field)
- Fix narrow schema field order (C before d) and element struct fields (da, db not da, dc)
- Supply complete wide-schema input rows (include dc boolean in d elements)
- Correct expected output to match narrow schema projection
- Add try/finally to ensure both iterators are always closed
- Fix `unknown field` to expect `SparkRuntimeException` not `AnalysisException`

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants