Skip to content

feat: make execution transfomation application rule generic#25

Open
askalt wants to merge 1 commit intoaskalt/bump-candidatefrom
askalt/generalize-execution-rules-applier
Open

feat: make execution transfomation application rule generic#25
askalt wants to merge 1 commit intoaskalt/bump-candidatefrom
askalt/generalize-execution-rules-applier

Conversation

@askalt
Copy link

@askalt askalt commented Feb 11, 2026

Currently, it depends on the particular execution transformation that resolves placeholders. However, it would be more convenient to be able to specify the rule which is required to be applied to the plan.

@askalt askalt force-pushed the askalt/generalize-execution-rules-applier branch from 5c2f852 to e807be9 Compare February 11, 2026 08:28
@askalt askalt force-pushed the askalt/generalize-execution-rules-applier branch from e807be9 to 316fd62 Compare February 11, 2026 08:57
@askalt askalt force-pushed the askalt/generalize-execution-rules-applier branch from 316fd62 to c87dd45 Compare February 11, 2026 09:12
Currently, it depends on the particular execution transformation
that resolves placeholders. However, it would be more convenient
to be able to specify the rule which is required to be applied to
the plan.
@askalt askalt force-pushed the askalt/generalize-execution-rules-applier branch from c87dd45 to d213820 Compare February 11, 2026 10:15
@askalt
Copy link
Author

askalt commented Feb 11, 2026

@LLDay please, check the patch.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rename this file as well.

/// This optimization checks if `rule` requires to transform the plan and wraps the plan with
/// [`TransformPlanExec`] if it so, or adds rule to the existing transformation node.
Post {
rule: Arc<dyn ExecutionTransformationRule>,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a vec of rules here, because multiple optimizations with a single rule each are slower than a single optimization with multiple rules.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could express it with a rule combinator, i.e. write a rule that will take list of rules and answer "matches" if there is some rule in the list answering "matches".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

             if let Some(transformer) =
                    plan.as_any().downcast_ref::<TransformPlanExec>()
                {
                    let has_rule = transformer.has_dyn_rule(rule);
                    if has_rule {
                        // Rule is already applied.
                        Ok(plan)

Then has_rule check will not work properly.

Copy link
Author

@askalt askalt Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we make a rule:

struct RuleCombinator {
   rules: Vec<Arc<dyn ExecutionTransformationRule>>,
}

And then use single ExecutionTransformationApplier(RuleCombinator) in optimization pipeline.

Copy link

@LLDay LLDay Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to have a RuleCombinator, then TransformPlanExec should also have one rule. Now TransformPlanExec contains a vec of rules. Otherwise it will be a semantic mess.

RuleCombinator will make it more difficult to find specific rules. We will need to go through all the nested rules to check if there is a ResolvePlaceholdersRule in TransformPlanExec.

Copy link
Author

@askalt askalt Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to have a RuleCombinator, then TransformPlanExec should also have one rule. Now TransformPlanExec contains a vec of rules. Otherwise it will be a semantic mess.

It makes sense for me.

We will need to go through all the nested rules to check if there is a ResolvePlaceholdersRule in TransformPlanExec.

This check (if ResolvePlaceholdersRule in TransformPlanExec) is needed, as I understand, if we want to mix different ExecutionTransformationApplier (have several different such rules in optimizer rules). My point is: let's use one fixed rule ExecutionTransformationApplier(transformation) where transformation does all what we need to do before execution.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense for me.

Sorry, did you mean that you think the current solution is ok (one rule in optimizer, vec of rules for the exec plan), or that you also think that we should have a consistent organization of rules?

let's use one fixed rule ExecutionTransformationApplier(transformation) where transformation does all what we need to do before execution.

I don't mind having a one combined rule. But I would like to have a consistent approach to combining the rules in optimizer and execution plan.

Copy link

@LLDay LLDay Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to avoid this situation when we have duplicates.

TransformPlanExec {
    rules: [
        RuleCombinator(SomeRule1, SomeRule2, ResolvePlaceholdersRule),
        RuleCombinator(ResolvePlaceholdersRule),
        SomeRule3
    ]
}

transformer.has_dyn_rule(ResolvePlaceholdersRule) will return false.

Copy link
Author

@askalt askalt Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it seems, I finally got what you mean. When we see TransformPlanExec on post-phase, we need a way to combine transformations if there is no placeholder resolver. But why we even consider the case when TransformPlanExec is present on this phase? In the typical pipeline all such execs will be removed on pre-phase and hence we never see TransformPlanExec in post-phase. So the question is -- do we need to handle this rare case, complicating the code by introducing rules vec?


/// Checks if the given [`ExecutionPlan`] node matches the criteria for this rule.
fn matches(&mut self, _node: &Arc<dyn ExecutionPlan>) -> Result<bool> {
fn matches(&self, _node: &Arc<dyn ExecutionPlan>) -> Result<bool> {
Copy link

@LLDay LLDay Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to allow mutations for possible optimization. For example, ResolvePlaceholdersRule may construct a mapping plan_index -> expr_indices that will skip expressions without placeholders. For some reason, I didn't implement it, but that was the idea originally. Now we have a shared reference, and all the rules are stateless. You can resolve this thread if you think this API is suitable for our purposes.

@askalt
Copy link
Author

askalt commented Feb 11, 2026

Also noticed that when we re-execute a plan, we need to someway clear TransformPlanExec metrics itself.

@askalt
Copy link
Author

askalt commented Feb 11, 2026

Also noticed that when we re-execute a plan, we need to someway clear TransformPlanExec metrics itself.

Considering it, I think, we should not perform re-use plan actions in the separate node (miss-design from my side, it seems we a bit overengineered here). Plan should contain nodes which manipulate with data and are important for user to debug query performance.

I would simplify the code, removing introduced transformations. Let's apply all transformations required to re-execute plan and substitute placeholders separately (e.g., reset_plan_states from DF).

@LLDay
Copy link

LLDay commented Feb 12, 2026

Let's apply all transformations required to re-execute plan and substitute placeholders separately

Are you suggesting adding a separate method that simply resolves placeholders? And we will call this method every time before executing the plan?

Plan should contain nodes which manipulate with data and are important for user to debug query performance.

Yes, final plan contains only meaningful nodes. But our TransformPlanExec can be an ancillary node as OutputRequirementExec. We can prohibit calling execute on TransformPlanExec. I'm just offering alternative ideas, I'm not against a separate method that resolves placeholders and simplifies the code.

@askalt
Copy link
Author

askalt commented Feb 12, 2026

Let's apply all transformations required to re-execute plan and substitute placeholders separately

Are you suggesting adding a separate method that simply resolves placeholders? And we will call this method every time before executing the plan?

Plan should contain nodes which manipulate with data and are important for user to debug query performance.

Yes, final plan contains only meaningful nodes. But our TransformPlanExec can be an ancillary node as OutputRequirementExec. We can prohibit calling execute on TransformPlanExec. I'm just offering alternative ideas, I'm not against a separate method that resolves placeholders and simplifies the code.

Currently, I have the following arguments against the plan node:

(1) with_new_children semantic: this node is created for the fixed input plan and it is a fragile invariant that tree form must be preserved to not break indexes. I see that it is created on post-phase when tree-form typically is not modified, but I would to avoid such requirements, if we can.
(2) This node is always kept as a plan-tree root. So it is a miss-use to have it as a child of some node (it probably would work, but it is not the pattern how we want to use it). This causes a question why we even allow to keep as a child, implementing ExecutionPlan for it.
(3) It pollutes the plan.
(4) We anyway should reset its metrics when we re-execute the plan.

Yes, final plan contains only meaningful nodes. But our TransformPlanExec can be an ancillary node as OutputRequirementExec. We can prohibit calling execute on TransformPlanExec. I'm just offering alternative ideas, I'm not against a separate method that resolves placeholders and simplifies the code.

Yes, but due to (2) I suggest to implement a wrapper that will not an ExecutionPlan itself. Every user could implement custom wrapper based on the physical_expressions(...) API and actions required to do before the execution. So IMO generalization here would be over-kill (I reconsidered my opinion).

In DF itself we could leave basic wrapper that will allow to re-execute plan with placeholders, e.g. (will push after run some benchmarks):

/// Wraps an [`ExecutionPlan`] that may contain placeholders, making
/// it re-executable, avoiding re-planning stage.
///
/// # Limitations
///
/// Plan does not support re-execution if it (OR):
///
/// * uses dynamic filters,
/// * represents a recursive query.
///
/// This invariant is not enforced by [`ReusableExecutionPlan`] itself, so it
/// must be checked by user.
///
pub struct ReusableExecutionPlan {
    binder: Binder,
    bound_plan: Option<Arc<dyn ExecutionPlan>>,
}

impl ReusableExecutionPlan {
    /// Make a new [`ReusableExecutionPlan`] bound to the passed `plan`.
    pub fn new(plan: Arc<dyn ExecutionPlan>) -> Self {
        let binder = Binder::new(plan);
        Self {
            binder,
            bound_plan: None,
        }
    }

    /// Return a ready to execution instance of [`ReusableExecutionPlan`] where
    /// placeholders are bound to the passed `params`.
    pub fn bind(&self, params: Option<&ParamValues>) -> Result<Self> {
        let bound_plan = self.binder.bind(params).map(Some)?;
        Ok(Self {
            binder: self.binder.clone(),
            bound_plan,
        })
    }

    /// Return an inner plan to execute.
    ///
    /// If this plan is a result of [`Self::bind`] call, then bound plan is returned.
    /// Otherwise, an initial plan is returned.
    pub fn plan(&self) -> Arc<dyn ExecutionPlan> {
        self.bound_plan
            .clone()
            .unwrap_or_else(|| Arc::clone(&self.binder.plan))
    }
}

impl From<Arc<dyn ExecutionPlan>> for ReusableExecutionPlan {
    fn from(plan: Arc<dyn ExecutionPlan>) -> Self {
        Self::new(plan)
    }
}

#[derive(Debug, Clone)]
struct NodeWithPlaceholders {
    /// The index of the node in the tree traversal.
    idx: usize,
    /// Positions of the placeholders among plan physical expressions.
    placeholder_idx: Vec<usize>,
}

impl NodeWithPlaceholders {
    /// Returns [`Some`] if passed `node` contains placeholders and must
    /// be resolved on binding stage.
    fn new(node: &Arc<dyn ExecutionPlan>, idx: usize) -> Option<Self> {
        let placeholder_idx = if let Some(iter) = node.physical_expressions() {
            iter.enumerate()
                .filter_map(|(i, expr)| {
                    if has_placeholders(&expr) {
                        Some(i)
                    } else {
                        None
                    }
                })
                .collect()
        } else {
            vec![]
        };

        if placeholder_idx.is_empty() {
            None
        } else {
            Some(Self {
                idx,
                placeholder_idx,
            })
        }
    }

    fn resolve(
        &self,
        node: &Arc<dyn ExecutionPlan>,
        params: Option<&ParamValues>,
    ) -> Result<Arc<dyn ExecutionPlan>> {
        let Some(expr) = node.physical_expressions() else {
            return exec_err!("node {} does not support expressions export", node.name());
        };
        let mut exprs: Vec<Arc<dyn PhysicalExpr>> = expr.collect();
        for idx in self.placeholder_idx.iter() {
            exprs[*idx] = resolve_expr_placeholders(Arc::clone(&exprs[*idx]), params)?;
        }
        let Some(resolved_node) =
            node.with_physical_expressions(ReplacePhysicalExpr { exprs })?
        else {
            return exec_err!(
                "node {} does not support expressions replace",
                node.name()
            );
        };
        Ok(resolved_node)
    }
}

/// Helper to bound placeholders and reset plan nodes state.
#[derive(Clone)]
struct Binder {
    /// Created during [`Binder`] construction.
    /// This way we avoid runtime rebuild for expressions without placeholders.
    nodes_to_resolve: Arc<[NodeWithPlaceholders]>,
    plan: Arc<dyn ExecutionPlan>,
}

impl Binder {
    fn new(plan: Arc<dyn ExecutionPlan>) -> Self {
        let mut nodes_to_resolve = vec![];
        let mut cursor = 0;

        plan.apply(|node| {
            let idx = cursor;
            cursor += 1;
            if let Some(node) = NodeWithPlaceholders::new(node, idx) {
                nodes_to_resolve.push(node);
            }
            Ok(TreeNodeRecursion::Continue)
        })
        .unwrap();

        Self {
            plan,
            nodes_to_resolve: nodes_to_resolve.into(),
        }
    }

    fn bind(&self, params: Option<&ParamValues>) -> Result<Arc<dyn ExecutionPlan>> {
        let mut cursor = 0;
        let mut resolve_node_idx = 0;
        Arc::clone(&self.plan)
            .transform_down(|node| {
                let idx = cursor;
                cursor += 1;
                if resolve_node_idx < self.nodes_to_resolve.len()
                    && self.nodes_to_resolve[resolve_node_idx].idx == idx
                {
                    // Note: `resolve` replases plan expressions, which also resets a plan state.
                    let resolved_node =
                        self.nodes_to_resolve[resolve_node_idx].resolve(&node, params)?;
                    resolve_node_idx += 1;
                    Ok(Transformed::yes(resolved_node))
                } else {
                    // Reset state.
                    Ok(Transformed::yes(node.reset_state()?))
                }
            })
            .map(|tnr| tnr.data)
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants