Skip to content

perf(planner): Skip unnecessary processing for leaf expressions and trivial predicates in wide projections#27544

Closed
feilong-liu wants to merge 1 commit intoprestodb:masterfrom
feilong-liu:wide-column-additional-optimizations
Closed

perf(planner): Skip unnecessary processing for leaf expressions and trivial predicates in wide projections#27544
feilong-liu wants to merge 1 commit intoprestodb:masterfrom
feilong-liu:wide-column-additional-optimizations

Conversation

@feilong-liu
Copy link
Copy Markdown
Contributor

@feilong-liu feilong-liu commented Apr 8, 2026

Description

Three targeted fast-path optimizations for wide-column queries (e.g., FAB BA capping query with ~2225 column assignments):

  1. SymbolMapper.map(RowExpression): Skip TreeRewriter instantiation for leaf expressions (VariableReferenceExpression and ConstantExpression). For identity assignments, call map(VariableReferenceExpression) directly instead of creating an anonymous RowExpressionRewriter + TreeRewriter.

  2. PruneUnreferencedOutputs.visitProject: Short-circuit VariablesExtractor.extractUnique() for VariableReferenceExpression assignments by directly adding the variable, avoiding ImmutableList.Builder, visitor traversal, and ImmutableSet allocation.

  3. PredicatePushDown.visitProject: Early return when inherited predicate is TRUE_CONSTANT (no conjuncts to push), skipping isDeterministic evaluation on all assignments.

Motivation and Context

For wide projections with ~2225 assignments, the majority are passthrough identity variables (col := col). Each of these was going through expensive processing paths that produce no useful work:

  • SymbolMapper instantiated a full RowExpressionTreeRewriter + anonymous RowExpressionRewriter for every leaf expression
  • PruneUnreferencedOutputs ran full VariablesExtractor.extractUnique() with visitor traversal for simple variable references
  • PredicatePushDown evaluated isDeterministic on all assignments even when the inherited predicate is trivially TRUE

Impact

Reduces optimizer overhead for queries with wide projections. No functional behavior changes — all optimizations are pure fast paths that produce identical results.

Test Plan

  • Added testMapRowExpressionVariable, testMapRowExpressionConstant, testMapRowExpressionUnmappedVariable in TestSymbolMapper
  • Added testProjectWithVariableReferenceAssignments in TestPruneUnreferencedOutputs
  • Added testNoPredicateOverProjection in TestPredicatePushdown

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

== RELEASE NOTES ==
General Changes
* Improve query planning performance for wide projections by skipping unnecessary processing for leaf expressions and trivial predicates in SymbolMapper, PruneUnreferencedOutputs, and PredicatePushDown.

@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Apr 8, 2026
@feilong-liu feilong-liu requested review from a team and jaystarshot as code owners April 8, 2026 22:11
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Apr 8, 2026

Reviewer's Guide

Introduces targeted fast-path optimizations in the query planner for wide-column project nodes by short‑circuiting common cases in symbol mapping, pruning unreferenced outputs, and predicate pushdown without changing semantics.

File-Level Changes

Change Details Files
Add fast-path handling for leaf RowExpressions in SymbolMapper.map to avoid unnecessary TreeRewriter usage.
  • Return mapped VariableReferenceExpression directly when the input RowExpression is a VariableReferenceExpression.
  • Return ConstantExpression as-is without invoking RowExpressionTreeRewriter.
  • Retain existing RowExpressionTreeRewriter path for non-leaf expressions.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/SymbolMapper.java
Optimize PruneUnreferencedOutputs project visitation by avoiding full variable extraction for simple variable assignments.
  • Detect VariableReferenceExpression assignments and add them directly to expectedInputs.
  • Fallback to VariablesExtractor.extractUnique for non-variable expressions.
  • Keep existing logic for building the filtered Assignments based on referenced variables.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/PruneUnreferencedOutputs.java
Add an early-return fast path in PredicatePushDown when the inherited predicate is TRUE to skip determinism checks.
  • Check for TRUE_CONSTANT inherited predicate at the start of visitProject.
  • When TRUE_CONSTANT, call context.defaultRewrite on the ProjectNode with TRUE_CONSTANT and skip determinism evaluation over assignments.
  • Preserve existing deterministic variable collection and predicate pushdown logic for non-TRUE inherited predicates.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/PredicatePushDown.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The fast path in PredicatePushDown.visitProject should likely check context.get() == TRUE_CONSTANT rather than .equals(TRUE_CONSTANT), since an arbitrary expression that evaluates to true but still contains conjuncts would now bypass predicate pushdown, changing behavior rather than just optimizing.
  • In the same fast path, consider early-returning using the existing context.get() instead of hardcoding TRUE_CONSTANT into defaultRewrite to avoid subtly changing behavior if a non-literal true predicate ever reaches this code path.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The fast path in `PredicatePushDown.visitProject` should likely check `context.get() == TRUE_CONSTANT` rather than `.equals(TRUE_CONSTANT)`, since an arbitrary expression that evaluates to true but still contains conjuncts would now bypass predicate pushdown, changing behavior rather than just optimizing.
- In the same fast path, consider early-returning using the existing `context.get()` instead of hardcoding `TRUE_CONSTANT` into `defaultRewrite` to avoid subtly changing behavior if a non-literal true predicate ever reaches this code path.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@feilong-liu feilong-liu changed the title Additional wide-column query planning optimizations perf(planner): Skip unnecessary processing for leaf expressions and trivial predicates in wide projections Apr 8, 2026
Three targeted fast paths for wide-column queries:

1. SymbolMapper.map(RowExpression): Skip TreeRewriter instantiation for
   leaf expressions (VariableReferenceExpression and ConstantExpression).

2. PruneUnreferencedOutputs.visitProject: Short-circuit
   VariablesExtractor.extractUnique() for VariableReferenceExpression
   assignments by directly adding the variable.

3. PredicatePushDown.visitProject: Early return when inherited predicate
   is TRUE_CONSTANT, skipping isDeterministic evaluation on all assignments.
@feilong-liu feilong-liu force-pushed the wide-column-additional-optimizations branch from ca0c364 to 69aa6f7 Compare April 8, 2026 22:45
@feilong-liu feilong-liu closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants