Skip to content

Add Python bindings for accessing ExecutionMetrics#1381

Open
ShreyeshArangath wants to merge 2 commits intoapache:mainfrom
ShreyeshArangath:feat/support-metrics
Open

Add Python bindings for accessing ExecutionMetrics#1381
ShreyeshArangath wants to merge 2 commits intoapache:mainfrom
ShreyeshArangath:feat/support-metrics

Conversation

@ShreyeshArangath
Copy link

@ShreyeshArangath ShreyeshArangath commented Feb 15, 2026

Which issue does this PR close?

Closes #1379

Rationale for this change

Today, DataFusion Python only exposes execution metrics through formatted console output via explain(analyze=True). This makes it difficult to programmatically inspect execution behavior.

There is currently no structured python API to access per-operator metrics such as output_rows, elapsed_compute, spill_count and other runtime metrics collected during execution.

This PR introduces APIs to surface the execution metrics, mirroring the Rust API in datafusion::physical_plan::metrics.

What changes are included in this PR?

  • Added plan caching to PyDataFrame so the physical plan used during execution is retained and available for metrics access.
  • Kept the metrics() method and added collect_metrics() helper to walk the execution plan tree and aggregate metrics from all operators.

Are there any user-facing changes?

Users can now programmatically access execution metrics

  df = ctx.sql("SELECT * FROM t WHERE x > 1")
  df.collect()
  plan = df.execution_plan()
  metrics = plan.collect_metrics() 
  for operator_name, metrics_set in metrics:
      print(f"{operator_name}: {metrics_set.output_rows} rows")

@ShreyeshArangath ShreyeshArangath changed the title feat: add Python bindings for accessing ExecutionMetrics Add Python bindings for accessing ExecutionMetrics Feb 15, 2026
@ShreyeshArangath ShreyeshArangath marked this pull request as ready for review February 15, 2026 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Python bindings for accessing ExecutionMetrics

1 participant