Skip to content

deps: test DataFusion 53.0#3629

Draft
mbutrovich wants to merge 17 commits intoapache:mainfrom
mbutrovich:df53
Draft

deps: test DataFusion 53.0#3629
mbutrovich wants to merge 17 commits intoapache:mainfrom
mbutrovich:df53

Conversation

@mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Mar 4, 2026

Which issue does this PR close?

N/A.

Rationale for this change

Test DataFusion with 53.0.

What changes are included in this PR?

Dependency changes:

  • datafusion, datafusion-datasource, datafusion-physical-expr-adapter, datafusion-spark → git tag 53.0.0-rc2
  • datafusion-spark now with features = ["core"]
  • arrow 57.3.0 → 58.0.0
  • parquet 57.3.0 → 58.0.0
  • object_store 0.12.3 → 0.13.1
  • iceberg updated to latest df53-upgrade branch

API fixes:

  • ExecutionPlan::properties() returns &Arc<PlanProperties> — wrapped cache fields in Arc across 5 files
  • Removed ExecutionPlan::statistics() from parquet_writer and shuffle_writer
  • HashJoinExec::try_new takes new null_aware: bool param — added false
  • PhysicalExprAdapterFactory::create now returns Result — added ? and Ok()
  • EncryptionFactory methods return Result<...> instead of Result<..., DataFusionError>
  • DefaultPhysicalExprAdapterFactory::create now returns Result — added ?
  • Replaced func.coerce_types() with fields_with_udf() in create_scalar_function_expr — fixes type coercion for UDFs using Signature::coercible (e.g. array_repeat expecting Int64 count)
  • Registered SparkArrayRepeat from datafusion-spark for Spark-compatible null semantics
  • Migrated hdfs ObjectStore impl to object_store 0.13 API — removed trait methods moved to ObjectStoreExt (get, get_range, head, delete, copy, rename, copy_if_not_exists), added new required methods (delete_stream, copy_opts), rewrote get_ranges to use read_range directly

Feature-gated (blocked on upstream):

  • iceberg — gated behind iceberg feature (iceberg-rust still on arrow 57)
  • hdfs-opendal — removed from defaults (object_store_opendal still on 0.12, tracking opendal#7237)

Known test issue:

  • test_batch_coalescing_reduces_size fails due to debug assertion bug in arrow-select 58.0.0 (arrow-rs#9506, fixed in arrow-rs#9508)

Remaining warnings:

  • CoalesceBatchesExec is deprecated (needs separate migration)
  • runtime_env unused when hdfs-opendal is off (expected)

How are these changes tested?

Existing tests.

@mbutrovich mbutrovich closed this Mar 4, 2026
@mbutrovich mbutrovich reopened this Mar 17, 2026
@mbutrovich mbutrovich changed the title deps: test DataFusion 53 deps: test DataFusion 53.0 Mar 17, 2026
@mbutrovich
Copy link
Contributor Author

mbutrovich commented Mar 17, 2026

So the shuffle failures are related to apache/arrow-rs#9506

I opened an upstream bug for hash join: apache/datafusion#20995

@mbutrovich
Copy link
Contributor Author

Down to array_repeat as the only problematic expression at this point, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant