feat: make execution transfomation application rule generic by askalt · Pull Request #25 · tarantool/datafusion

askalt · 2026-02-11T08:27:48Z

Currently, it depends on the particular execution transformation that resolves placeholders. However, it would be more convenient to be able to specify the rule which is required to be applied to the plan.

askalt · 2026-02-11T10:37:43Z

@LLDay please, check the patch.

LLDay · 2026-02-11T11:00:40Z

datafusion/core/tests/physical_optimizer/physical_expr_resolver.rs

We should rename this file as well.

LLDay · 2026-02-11T11:05:23Z

datafusion/physical-optimizer/src/exec_transform_apply.rs

+    /// This optimization checks if `rule` requires to transform the plan and wraps the plan with
+    /// [`TransformPlanExec`] if it so, or adds rule to the existing transformation node.
+    Post {
+        rule: Arc<dyn ExecutionTransformationRule>,


We should have a vec of rules here, because multiple optimizations with a single rule each are slower than a single optimization with multiple rules.

We could express it with a rule combinator, i.e. write a rule that will take list of rules and answer "matches" if there is some rule in the list answering "matches".

if let Some(transformer) = plan.as_any().downcast_ref::<TransformPlanExec>() { let has_rule = transformer.has_dyn_rule(rule); if has_rule { // Rule is already applied. Ok(plan)

Then has_rule check will not work properly.

I mean, we make a rule:

struct RuleCombinator { rules: Vec<Arc<dyn ExecutionTransformationRule>>, }

And then use single ExecutionTransformationApplier(RuleCombinator) in optimization pipeline.

If we want to have a RuleCombinator, then TransformPlanExec should also have one rule. Now TransformPlanExec contains a vec of rules. Otherwise it will be a semantic mess.

RuleCombinator will make it more difficult to find specific rules. We will need to go through all the nested rules to check if there is a ResolvePlaceholdersRule in TransformPlanExec.

If we want to have a RuleCombinator, then TransformPlanExec should also have one rule. Now TransformPlanExec contains a vec of rules. Otherwise it will be a semantic mess.

It makes sense for me.

We will need to go through all the nested rules to check if there is a ResolvePlaceholdersRule in TransformPlanExec.

This check (if ResolvePlaceholdersRule in TransformPlanExec) is needed, as I understand, if we want to mix different ExecutionTransformationApplier (have several different such rules in optimizer rules). My point is: let's use one fixed rule ExecutionTransformationApplier(transformation) where transformation does all what we need to do before execution.

It makes sense for me.

Sorry, did you mean that you think the current solution is ok (one rule in optimizer, vec of rules for the exec plan), or that you also think that we should have a consistent organization of rules?

let's use one fixed rule ExecutionTransformationApplier(transformation) where transformation does all what we need to do before execution.

I don't mind having a one combined rule. But I would like to have a consistent approach to combining the rules in optimizer and execution plan.

I just want to avoid this situation when we have duplicates.

TransformPlanExec { rules: [ RuleCombinator(SomeRule1, SomeRule2, ResolvePlaceholdersRule), RuleCombinator(ResolvePlaceholdersRule), SomeRule3 ] }

transformer.has_dyn_rule(ResolvePlaceholdersRule) will return false.

Ok, it seems, I finally got what you mean. When we see TransformPlanExec on post-phase, we need a way to combine transformations if there is no placeholder resolver. But why we even consider the case when TransformPlanExec is present on this phase? In the typical pipeline all such execs will be removed on pre-phase and hence we never see TransformPlanExec in post-phase. So the question is -- do we need to handle this rare case, complicating the code by introducing rules vec?

LLDay · 2026-02-11T11:27:06Z

datafusion/physical-plan/src/plan_transformer/mod.rs

-
    /// Checks if the given [`ExecutionPlan`] node matches the criteria for this rule.
-    fn matches(&mut self, _node: &Arc<dyn ExecutionPlan>) -> Result<bool> {
+    fn matches(&self, _node: &Arc<dyn ExecutionPlan>) -> Result<bool> {


The idea was to allow mutations for possible optimization. For example, ResolvePlaceholdersRule may construct a mapping plan_index -> expr_indices that will skip expressions without placeholders. For some reason, I didn't implement it, but that was the idea originally. Now we have a shared reference, and all the rules are stateless. You can resolve this thread if you think this API is suitable for our purposes.

askalt · 2026-02-11T18:39:42Z

Also noticed that when we re-execute a plan, we need to someway clear TransformPlanExec metrics itself.

askalt · 2026-02-11T19:01:38Z

Also noticed that when we re-execute a plan, we need to someway clear TransformPlanExec metrics itself.

Considering it, I think, we should not perform re-use plan actions in the separate node (miss-design from my side, it seems we a bit overengineered here). Plan should contain nodes which manipulate with data and are important for user to debug query performance.

I would simplify the code, removing introduced transformations. Let's apply all transformations required to re-execute plan and substitute placeholders separately (e.g., reset_plan_states from DF).

LLDay · 2026-02-12T09:39:35Z

Let's apply all transformations required to re-execute plan and substitute placeholders separately

Are you suggesting adding a separate method that simply resolves placeholders? And we will call this method every time before executing the plan?

Plan should contain nodes which manipulate with data and are important for user to debug query performance.

Yes, final plan contains only meaningful nodes. But our TransformPlanExec can be an ancillary node as OutputRequirementExec. We can prohibit calling execute on TransformPlanExec. I'm just offering alternative ideas, I'm not against a separate method that resolves placeholders and simplifies the code.

askalt · 2026-02-12T10:03:42Z

Let's apply all transformations required to re-execute plan and substitute placeholders separately

Are you suggesting adding a separate method that simply resolves placeholders? And we will call this method every time before executing the plan?

Plan should contain nodes which manipulate with data and are important for user to debug query performance.

Yes, final plan contains only meaningful nodes. But our TransformPlanExec can be an ancillary node as OutputRequirementExec. We can prohibit calling execute on TransformPlanExec. I'm just offering alternative ideas, I'm not against a separate method that resolves placeholders and simplifies the code.

Currently, I have the following arguments against the plan node:

(1) with_new_children semantic: this node is created for the fixed input plan and it is a fragile invariant that tree form must be preserved to not break indexes. I see that it is created on post-phase when tree-form typically is not modified, but I would to avoid such requirements, if we can.
(2) This node is always kept as a plan-tree root. So it is a miss-use to have it as a child of some node (it probably would work, but it is not the pattern how we want to use it). This causes a question why we even allow to keep as a child, implementing ExecutionPlan for it.
(3) It pollutes the plan.
(4) We anyway should reset its metrics when we re-execute the plan.

Yes, final plan contains only meaningful nodes. But our TransformPlanExec can be an ancillary node as OutputRequirementExec. We can prohibit calling execute on TransformPlanExec. I'm just offering alternative ideas, I'm not against a separate method that resolves placeholders and simplifies the code.

Yes, but due to (2) I suggest to implement a wrapper that will not an ExecutionPlan itself. Every user could implement custom wrapper based on the physical_expressions(...) API and actions required to do before the execution. So IMO generalization here would be over-kill (I reconsidered my opinion).

In DF itself we could leave basic wrapper that will allow to re-execute plan with placeholders, e.g. (will push after run some benchmarks):

/// Wraps an [`ExecutionPlan`] that may contain placeholders, making
/// it re-executable, avoiding re-planning stage.
///
/// # Limitations
///
/// Plan does not support re-execution if it (OR):
///
/// * uses dynamic filters,
/// * represents a recursive query.
///
/// This invariant is not enforced by [`ReusableExecutionPlan`] itself, so it
/// must be checked by user.
///
pub struct ReusableExecutionPlan {
    binder: Binder,
    bound_plan: Option<Arc<dyn ExecutionPlan>>,
}

impl ReusableExecutionPlan {
    /// Make a new [`ReusableExecutionPlan`] bound to the passed `plan`.
    pub fn new(plan: Arc<dyn ExecutionPlan>) -> Self {
        let binder = Binder::new(plan);
        Self {
            binder,
            bound_plan: None,
        }
    }

    /// Return a ready to execution instance of [`ReusableExecutionPlan`] where
    /// placeholders are bound to the passed `params`.
    pub fn bind(&self, params: Option<&ParamValues>) -> Result<Self> {
        let bound_plan = self.binder.bind(params).map(Some)?;
        Ok(Self {
            binder: self.binder.clone(),
            bound_plan,
        })
    }

    /// Return an inner plan to execute.
    ///
    /// If this plan is a result of [`Self::bind`] call, then bound plan is returned.
    /// Otherwise, an initial plan is returned.
    pub fn plan(&self) -> Arc<dyn ExecutionPlan> {
        self.bound_plan
            .clone()
            .unwrap_or_else(|| Arc::clone(&self.binder.plan))
    }
}

impl From<Arc<dyn ExecutionPlan>> for ReusableExecutionPlan {
    fn from(plan: Arc<dyn ExecutionPlan>) -> Self {
        Self::new(plan)
    }
}

#[derive(Debug, Clone)]
struct NodeWithPlaceholders {
    /// The index of the node in the tree traversal.
    idx: usize,
    /// Positions of the placeholders among plan physical expressions.
    placeholder_idx: Vec<usize>,
}

impl NodeWithPlaceholders {
    /// Returns [`Some`] if passed `node` contains placeholders and must
    /// be resolved on binding stage.
    fn new(node: &Arc<dyn ExecutionPlan>, idx: usize) -> Option<Self> {
        let placeholder_idx = if let Some(iter) = node.physical_expressions() {
            iter.enumerate()
                .filter_map(|(i, expr)| {
                    if has_placeholders(&expr) {
                        Some(i)
                    } else {
                        None
                    }
                })
                .collect()
        } else {
            vec![]
        };

        if placeholder_idx.is_empty() {
            None
        } else {
            Some(Self {
                idx,
                placeholder_idx,
            })
        }
    }

    fn resolve(
        &self,
        node: &Arc<dyn ExecutionPlan>,
        params: Option<&ParamValues>,
    ) -> Result<Arc<dyn ExecutionPlan>> {
        let Some(expr) = node.physical_expressions() else {
            return exec_err!("node {} does not support expressions export", node.name());
        };
        let mut exprs: Vec<Arc<dyn PhysicalExpr>> = expr.collect();
        for idx in self.placeholder_idx.iter() {
            exprs[*idx] = resolve_expr_placeholders(Arc::clone(&exprs[*idx]), params)?;
        }
        let Some(resolved_node) =
            node.with_physical_expressions(ReplacePhysicalExpr { exprs })?
        else {
            return exec_err!(
                "node {} does not support expressions replace",
                node.name()
            );
        };
        Ok(resolved_node)
    }
}

/// Helper to bound placeholders and reset plan nodes state.
#[derive(Clone)]
struct Binder {
    /// Created during [`Binder`] construction.
    /// This way we avoid runtime rebuild for expressions without placeholders.
    nodes_to_resolve: Arc<[NodeWithPlaceholders]>,
    plan: Arc<dyn ExecutionPlan>,
}

impl Binder {
    fn new(plan: Arc<dyn ExecutionPlan>) -> Self {
        let mut nodes_to_resolve = vec![];
        let mut cursor = 0;

        plan.apply(|node| {
            let idx = cursor;
            cursor += 1;
            if let Some(node) = NodeWithPlaceholders::new(node, idx) {
                nodes_to_resolve.push(node);
            }
            Ok(TreeNodeRecursion::Continue)
        })
        .unwrap();

        Self {
            plan,
            nodes_to_resolve: nodes_to_resolve.into(),
        }
    }

    fn bind(&self, params: Option<&ParamValues>) -> Result<Arc<dyn ExecutionPlan>> {
        let mut cursor = 0;
        let mut resolve_node_idx = 0;
        Arc::clone(&self.plan)
            .transform_down(|node| {
                let idx = cursor;
                cursor += 1;
                if resolve_node_idx < self.nodes_to_resolve.len()
                    && self.nodes_to_resolve[resolve_node_idx].idx == idx
                {
                    // Note: `resolve` replases plan expressions, which also resets a plan state.
                    let resolved_node =
                        self.nodes_to_resolve[resolve_node_idx].resolve(&node, params)?;
                    resolve_node_idx += 1;
                    Ok(Transformed::yes(resolved_node))
                } else {
                    // Reset state.
                    Ok(Transformed::yes(node.reset_state()?))
                }
            })
            .map(|tnr| tnr.data)
    }
}

github-actions bot added core optimizer proto physical-plan sqllogictest datasource labels Feb 11, 2026

askalt force-pushed the askalt/generalize-execution-rules-applier branch from 5c2f852 to e807be9 Compare February 11, 2026 08:28

github-actions bot removed sqllogictest datasource labels Feb 11, 2026

askalt force-pushed the askalt/generalize-execution-rules-applier branch from e807be9 to 316fd62 Compare February 11, 2026 08:57

github-actions bot added the datasource label Feb 11, 2026

askalt force-pushed the askalt/generalize-execution-rules-applier branch from 316fd62 to c87dd45 Compare February 11, 2026 09:12

feat: make execution transfomation application rule generic

d213820

Currently, it depends on the particular execution transformation that resolves placeholders. However, it would be more convenient to be able to specify the rule which is required to be applied to the plan.

askalt force-pushed the askalt/generalize-execution-rules-applier branch from c87dd45 to d213820 Compare February 11, 2026 10:15

github-actions bot added the sqllogictest label Feb 11, 2026

LLDay reviewed Feb 11, 2026

View reviewed changes

Conversation

askalt commented Feb 11, 2026

Uh oh!

askalt commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LLDay Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

LLDay Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

askalt Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

LLDay Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

askalt Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LLDay Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

askalt Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LLDay Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

LLDay Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

askalt Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LLDay Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

askalt commented Feb 11, 2026

Uh oh!

askalt commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LLDay commented Feb 12, 2026

Uh oh!

askalt commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

askalt commented Feb 11, 2026 •

edited

Loading

askalt Feb 11, 2026 •

edited

Loading

LLDay Feb 11, 2026 •

edited

Loading

askalt Feb 11, 2026 •

edited

Loading

LLDay Feb 11, 2026 •

edited

Loading

askalt Feb 11, 2026 •

edited

Loading

LLDay Feb 11, 2026 •

edited

Loading

askalt commented Feb 11, 2026 •

edited

Loading

askalt commented Feb 12, 2026 •

edited

Loading