Skip to content

feat(planner): Support scalar functions in MV query rewriting#27549

Open
ceekay47 wants to merge 1 commit intoprestodb:masterfrom
ceekay47:export-D99652975
Open

feat(planner): Support scalar functions in MV query rewriting#27549
ceekay47 wants to merge 1 commit intoprestodb:masterfrom
ceekay47:export-D99652975

Conversation

@ceekay47
Copy link
Copy Markdown
Contributor

@ceekay47 ceekay47 commented Apr 9, 2026

Summary:
The MV query rewriter previously rejected all non-aggregate
function calls. This change treats any function not in
ASSOCIATIVE_REWRITE_FUNCTIONS or NON_ASSOCIATIVE_REWRITE_FUNCTIONS as a
scalar function and recursively rewrites its arguments. This supports all
scalar functions (CONCAT, ABS, JSON_EXTRACT, JSON_EXTRACT_SCALAR, LOWER,
CAST, IF, COALESCE, CASE, etc.) without maintaining an allowlist.

Changes:

  • visitFunctionCall: rewrite arguments recursively for non-aggregate functions
  • visitExpression: re-dispatch after prefix stripping so DereferenceExpression
    nodes get mapped correctly
  • Add visitor overrides for IfExpression, CoalesceExpression,
    SearchedCaseExpression, SimpleCaseExpression, WhenClause,
    NullIfExpression, Cast, InPredicate, NotExpression, IsNullPredicate,
    IsNotNullPredicate
  • validateExpressionForGroupBy: allow scalar expressions over GROUP BY columns
  • Remove SUPPORTED_FUNCTION_CALLS gate from validator

Differential Revision: D99652975

Summary by Sourcery

Enable scalar functions and expressions in materialized view query rewriting and validation so more queries can be optimized to read from materialized views.

New Features:

  • Allow scalar function calls in SELECT lists to be rewritten when using materialized views, including nested scalar expressions and JSON functions.
  • Support scalar conditional and casting expressions (IF, COALESCE, CASE, CAST, etc.) in materialized view query rewriting.

Enhancements:

  • Improve expression visitor behavior so prefix-stripped expressions are fully reprocessed, ensuring correct identifier and dereference mapping during rewriting.
  • Relax materialized view query shape validation by removing the function allowlist gate and permitting scalar functions alongside aggregates.

Tests:

  • Add planner and optimizer tests covering scalar functions, JSON extraction, conditional expressions, casts, nested scalar expressions, and unmapped columns in materialized view rewrites.
  • Update Hive materialized view planner tests to reflect that scalar functions in subqueries are now supported and fully optimized.
  • Extend rewrite shape validator tests to confirm scalar and JSON functions with aggregates are accepted.
== RELEASE NOTES ==

General Changes
* Add support for scalar functions in materialized view query rewriting.
  Queries using functions like ``CONCAT``, ``ABS``, ``JSON_EXTRACT``,
  ``CAST``, ``IF``, ``COALESCE``, and ``CASE`` expressions now correctly
  rewrite to scan the materialized view.

Summary:
The MV query rewriter previously rejected all non-aggregate
function calls. This change treats any function not in
ASSOCIATIVE_REWRITE_FUNCTIONS or NON_ASSOCIATIVE_REWRITE_FUNCTIONS as a
scalar function and recursively rewrites its arguments. This supports all
scalar functions (CONCAT, ABS, JSON_EXTRACT, JSON_EXTRACT_SCALAR, LOWER,
CAST, IF, COALESCE, CASE, etc.) without maintaining an allowlist.

Changes:
- visitFunctionCall: rewrite arguments recursively for non-aggregate functions
- visitExpression: re-dispatch after prefix stripping so DereferenceExpression
  nodes get mapped correctly
- Add visitor overrides for IfExpression, CoalesceExpression,
  SearchedCaseExpression, SimpleCaseExpression, WhenClause,
  NullIfExpression, Cast, InPredicate, NotExpression, IsNullPredicate,
  IsNotNullPredicate
- validateExpressionForGroupBy: allow scalar expressions over GROUP BY columns
- Remove SUPPORTED_FUNCTION_CALLS gate from validator

```
== RELEASE NOTES ==

General Changes
* Add support for scalar functions in materialized view query rewriting.
  Queries using functions like ``CONCAT``, ``ABS``, ``JSON_EXTRACT``,
  ``CAST``, ``IF``, ``COALESCE``, and ``CASE`` expressions now correctly
  rewrite to scan the materialized view.
```

Differential Revision: D99652975
@ceekay47 ceekay47 requested review from a team, feilong-liu and jaystarshot as code owners April 9, 2026 04:58
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Apr 9, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Apr 9, 2026

Reviewer's Guide

Extends the materialized view (MV) query optimizer to treat all non-aggregate, non-associative functions as scalar expressions whose arguments are recursively rewritten, and relaxes the query-shape validator so scalar functions and conditionals in SELECT/GROUP BY contexts are supported, with corresponding tests added and updated.

Class diagram for updated MV query optimizer visitors

classDiagram
    class MaterializedViewQueryOptimizer {
        - Map baseToViewColumnMap
        - Set ASSOCIATIVE_REWRITE_FUNCTIONS
        + validateExpressionForGroupBy(groupByOfMaterializedView ~Set<Expression>~, expression Expression) boolean
    }

    class MaterializedViewQueryOptimizer_RewritingVisitor {
        + visitExpression(node Expression, context Void) Node
        + visitFunctionCall(node FunctionCall, context Void) Node
        + visitComparisonExpression(node ComparisonExpression, context Void) Node
        + visitIfExpression(node IfExpression, context Void) Node
        + visitCoalesceExpression(node CoalesceExpression, context Void) Node
        + visitSearchedCaseExpression(node SearchedCaseExpression, context Void) Node
        + visitSimpleCaseExpression(node SimpleCaseExpression, context Void) Node
        + visitWhenClause(node WhenClause, context Void) Node
        + visitNullIfExpression(node NullIfExpression, context Void) Node
        + visitCast(node Cast, context Void) Node
        + visitInPredicate(node InPredicate, context Void) Node
        + visitNotExpression(node NotExpression, context Void) Node
        + visitIsNullPredicate(node IsNullPredicate, context Void) Node
        + visitIsNotNullPredicate(node IsNotNullPredicate, context Void) Node
    }

    class MaterializedViewRewriteQueryShapeValidator {
        + validateMaterializedViewOptimizationQueryShape(querySpecification QuerySpecification) Optional
    }

    class MaterializedViewRewriteQueryShapeValidator_Visitor {
        - Optional errorMessage
        + validateMaterializedViewOptimizationQueryShape(querySpecification QuerySpecification) Optional
        + visitGroupBy(node GroupBy, context Void) Void
        + visitFunctionCall(node FunctionCall, context Void) Void
    }

    MaterializedViewQueryOptimizer "1" *-- "1" MaterializedViewQueryOptimizer_RewritingVisitor : uses
    MaterializedViewRewriteQueryShapeValidator "1" *-- "1" MaterializedViewRewriteQueryShapeValidator_Visitor : uses

    Expression <|-- IfExpression
    Expression <|-- CoalesceExpression
    Expression <|-- SearchedCaseExpression
    Expression <|-- SimpleCaseExpression
    Expression <|-- NullIfExpression
    Expression <|-- Cast
    Expression <|-- InPredicate
    Expression <|-- NotExpression
    Expression <|-- IsNullPredicate
    Expression <|-- FunctionCall
    Expression <|-- ComparisonExpression

    SearchedCaseExpression "*" o-- "*" WhenClause
    SimpleCaseExpression "*" o-- "*" WhenClause
Loading

Flow diagram for visitFunctionCall rewrite logic

flowchart TD
    A["visitFunctionCall(node)"] --> B["Check baseToViewColumnMap.containsKey(node)"]
    B -->|true| C["Return mapped Expression from baseToViewColumnMap"]
    B -->|false| D["Check ASSOCIATIVE_REWRITE_FUNCTIONS.contains(node.name)"]
    D -->|true| E["Handle associative rewrite (existing logic)"]
    D -->|false| F["Treat as scalar function"]
    F --> G["For each argument in node.arguments call process(argument, context)"]
    G --> H["Build rewrittenArguments list"]
    H --> I["Return new FunctionCall with rewrittenArguments and original properties"]
Loading

Flow diagram for validateExpressionForGroupBy with scalar support

flowchart TD
    A["validateExpressionForGroupBy(groupByOfMaterializedView, expression)"] --> B["Compute canRewrite for associative/non-associative functions"]
    B --> C["Compute isScalar:\nexpression is not Identifier\nand not associative FunctionCall"]
    C --> D["Check groupByOfMaterializedView.contains(expression)"]
    D -->|true| E["Return true (valid)"]
    D -->|false| F["Check materializedViewInfo.baseToViewColumnMap.containsKey(expression)"]
    F --> G["Result = groupByOfMaterializedView.contains(expression) || !(inBaseToViewMap || canRewrite || isScalar)"]
    G --> H["Return Result"]
Loading

File-Level Changes

Change Details Files
Treat non-aggregate function calls as scalar functions whose arguments are recursively MV‑rewritten instead of rejecting them.
  • In visitFunctionCall, stop throwing NOT_SUPPORTED for non-associative functions and instead recursively process each argument and rebuild the FunctionCall with rewritten arguments.
  • Ensure function calls that are in the associative rewrite set continue to go through existing aggregate rewrite logic and baseToViewColumnMap handling.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java
Improve expression visitor behavior so prefixed expressions and additional scalar/conditional AST nodes are properly traversed and rewritten.
  • In visitExpression, strip removable prefixes, and if the stripped node differs, redispatch via process(stripped, context) to ensure correct node-type-specific handling (e.g., DereferenceExpression).
  • Add explicit visitor overrides for IfExpression, CoalesceExpression, SearchedCaseExpression, SimpleCaseExpression, WhenClause, NullIfExpression, Cast, InPredicate, NotExpression, IsNullPredicate, and IsNotNullPredicate that recursively process their child expressions and rebuild the nodes.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java
Allow scalar expressions over GROUP BY columns when validating expressions for MV rewrites.
  • Extend validateExpressionForGroupBy to compute an isScalar flag for non-identifier expressions that are not associative aggregate function calls.
  • Update the validation condition so expressions are considered valid if they are in the MV GROUP BY, directly rewritable via baseToViewColumnMap or aggregate rewrite, or are scalar expressions whose leaves can be rewritten.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java
Relax MV rewrite query-shape validation by removing the function allowlist gate so scalar functions are no longer rejected.
  • Remove hasGroupBy tracking and SUPPORTED_FUNCTION_CALLS enforcement from MaterializedViewRewriteQueryShapeValidator.Visitor, leaving only basic shape validation.
  • Delete the function-call visit override that produced a 'function is not supported' error for names outside SUPPORTED_FUNCTION_CALLS.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewRewriteQueryShapeValidator.java
Add and adjust tests to cover scalar function and conditional expression support in MV rewrites, including nested and JSON/scalar cases, and update prior 'unsupported function' tests to reflect new behavior.
  • Add unit tests to TestMaterializedViewQueryOptimizer covering scalar functions (ABS, JSON_EXTRACT_SCALAR, GEOMETRIC_MEAN), IF/COALESCE/CASE, CAST, nested scalar expressions, and scenarios where scalar functions reference unmapped columns (ensuring those queries are not rewritten).
  • Rename and adjust tests in TestMaterializedViewQueryOptimizer and TestHiveMaterializedViewLogicalPlanner to treat previously unsupported functions as scalar, verifying MV-based rewrites and removing strict plan-shape assertions where appropriate.
  • Update TestMaterializedViewRewriteQueryShapeValidator to assert success for scalar functions in SELECT and in combination with aggregates and JSON functions, instead of expecting failures for "unsupported" functions.
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveMaterializedViewLogicalPlanner.java
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewRewriteQueryShapeValidator.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The new isScalar check in validateExpressionForGroupBy is purely syntactic and will treat any non-ASSOCIATIVE FunctionCall as scalar; consider using function metadata (e.g., aggregation/window vs scalar kind) so that aggregate or window functions that aren’t in ASSOCIATIVE_REWRITE_FUNCTIONS don’t accidentally get classified as scalar and allowed through validation.
  • The set of explicit visitor overrides for scalar expressions (IfExpression, CoalesceExpression, Case, NullIfExpression, predicates, etc.) all follow the same pattern of recursively processing children; you may want to centralize this behavior via a common expression visitor or helper to reduce duplication and make it easier to keep new scalar node types consistent in the future.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `isScalar` check in `validateExpressionForGroupBy` is purely syntactic and will treat any non-ASSOCIATIVE `FunctionCall` as scalar; consider using function metadata (e.g., aggregation/window vs scalar kind) so that aggregate or window functions that aren’t in `ASSOCIATIVE_REWRITE_FUNCTIONS` don’t accidentally get classified as scalar and allowed through validation.
- The set of explicit visitor overrides for scalar expressions (`IfExpression`, `CoalesceExpression`, `Case`, `NullIfExpression`, predicates, etc.) all follow the same pattern of recursively processing children; you may want to centralize this behavior via a common expression visitor or helper to reduce duplication and make it easier to keep new scalar node types consistent in the future.

## Individual Comments

### Comment 1
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java" line_range="319-315" />
<code_context>
     }

+    @Test
+    public void testScalarFunctionInSelect()
+    {
+        String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
+        String baseQuerySql = format("SELECT ABS(a), SUM(c) FROM %s GROUP BY a, b", BASE_TABLE_1);
+        String expectedRewrittenSql = format("SELECT ABS(a), SUM(sum_c) FROM %s GROUP BY a, b", VIEW_1);
+        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
+    }
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add coverage for scalar functions and expressions outside the SELECT list (WHERE/HAVING/ORDER BY)

Since the visitor now rewrites IF/CASE/COALESCE/NOT/IN/IS NULL, it would be valuable to add at least one test where these scalar expressions appear in WHERE, HAVING, or ORDER BY and still get rewritten to use the MV (e.g., `WHERE ABS(a) > 0`, `HAVING SUM(c) > 0 AND COALESCE(a, 0) > 10`, or `ORDER BY JSON_EXTRACT_SCALAR(a, '$.key')`). This will exercise the new visitor overrides beyond projections.

Suggested implementation:

```java
        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
    }

    @Test

    @Test
    public void testScalarFunctionInSelect()
    {
        String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
        String baseQuerySql = format("SELECT ABS(a), SUM(c) FROM %s GROUP BY a, b", BASE_TABLE_1);
        String expectedRewrittenSql = format("SELECT ABS(a), SUM(sum_c) FROM %s GROUP BY a, b", VIEW_1);
        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
    }

    @Test
    public void testScalarFunctionInWhere()
    {
        String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
        String baseQuerySql = format(
                "SELECT ABS(a), SUM(c) FROM %s WHERE ABS(a) > 0 GROUP BY a, b",
                BASE_TABLE_1);
        String expectedRewrittenSql = format(
                "SELECT ABS(a), SUM(sum_c) FROM %s WHERE ABS(a) > 0 GROUP BY a, b",
                VIEW_1);
        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
    }

    @Test
    public void testScalarExpressionInHaving()
    {
        String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
        String baseQuerySql = format(
                "SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b HAVING SUM(c) > 0 AND COALESCE(a, 0) > 10",
                BASE_TABLE_1);
        String expectedRewrittenSql = format(
                "SELECT a, b, SUM(sum_c) AS sum_c FROM %s GROUP BY a, b HAVING SUM(sum_c) > 0 AND COALESCE(a, 0) > 10",
                VIEW_1);
        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
    }

    @Test
    public void testScalarFunctionInOrderBy()
    {
        String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
        String baseQuerySql = format(
                "SELECT JSON_EXTRACT_SCALAR(a, '$.key'), SUM(c) FROM %s GROUP BY a, b ORDER BY JSON_EXTRACT_SCALAR(a, '$.key')",
                BASE_TABLE_1);
        String expectedRewrittenSql = format(
                "SELECT JSON_EXTRACT_SCALAR(a, '$.key'), SUM(sum_c) FROM %s GROUP BY a, b ORDER BY JSON_EXTRACT_SCALAR(a, '$.key')",
                VIEW_1);
        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
    }

    @Test

```

These edits assume:
1. `assertOptimizedQuery`, `BASE_TABLE_1`, `VIEW_1`, and `format` are already available in the class (as in the existing tests).
2. The base table column `a` is a type compatible with `ABS`, `COALESCE`, and `JSON_EXTRACT_SCALAR` usage in other tests in this file. If `a` is not JSON-typed anywhere else, you may want to adjust the `JSON_EXTRACT_SCALAR` expression to use whichever column is JSON in the existing test suite.
3. If the style in the surrounding file prefers single-line SQL strings, you can inline the multi-line `format` calls to match that convention.
</issue_to_address>

### Comment 2
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java" line_range="373-315" />
<code_context>
+    }
+
+    @Test
+    public void testNestedScalarFunctionsInSelect()
+    {
+        String originalViewSql = format("SELECT a as mv_a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
+        String baseQuerySql = format("SELECT ABS(a + b), SUM(c) FROM %s GROUP BY a, b", BASE_TABLE_1);
+        String expectedRewrittenSql = format("SELECT ABS(mv_a + b), SUM(sum_c) FROM %s GROUP BY mv_a, b", VIEW_1);
+        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
+    }
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for scalar wrapping an aggregate (e.g., ABS(SUM(c))) to cover validateExpressionForGroupBy changes

In particular, a query like `SELECT ABS(SUM(c)) FROM ... GROUP BY a` would exercise both the validator and the rewriter: ensuring the validator accepts a scalar over an aggregate on MV columns, and that the rewriter correctly rewrites the inner aggregate (e.g., `SUM(c)``SUM(sum_c)`) while preserving the outer scalar wrapper.
</issue_to_address>

### Comment 3
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java" line_range="381-388" />
<code_context>
+    }
+
+    @Test
+    public void testScalarFunctionUnmappedColumn()
+    {
+        String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1);
+        String baseQuerySql = format("SELECT ABS(d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1);
+        assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
+    }
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a mixed mapped/unmapped scalar expression case (e.g., ABS(a + d)) to confirm correct fallback behavior

Right now we only cover a scalar function over an unmapped column, where the optimizer correctly avoids rewriting. Please also add a case where the scalar expression mixes mapped and unmapped inputs (e.g., `SELECT ABS(a + d), SUM(b) ...` with the MV containing `a` and `b` but not `d`) and assert that the optimizer still falls back to the base query rather than partially rewriting the scalar expression.

```suggestion
    @Test
    public void testScalarFunctionUnmappedColumn()
    {
        String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1);
        String baseQuerySql = format("SELECT ABS(d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1);
        assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
    }

    @Test
    public void testScalarFunctionMixedMappedAndUnmappedColumns()
    {
        String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1);
        String baseQuerySql = format("SELECT ABS(a + d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1);
        assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
    }

```
</issue_to_address>

### Comment 4
<location path="presto-hive/src/test/java/com/facebook/presto/hive/TestHiveMaterializedViewLogicalPlanner.java" line_range="2124-2127" />
<code_context>

     @Test
-    public void testMaterializedViewOptimizationWithUnsupportedFunctionSubquery()
+    public void testMaterializedViewOptimizationWithScalarFunctionSubquery()
     {
         Session queryOptimizationWithMaterializedView = Session.builder(getSession())
</code_context>
<issue_to_address>
**suggestion (testing):** Reintroduce a plan-level assertion to ensure the scalar-function subqueries actually use the materialized views

This test used to assert on the plan to confirm MV usage but now only checks results. To keep verifying that `length()` subqueries are actually rewritten to use the MV, please add a minimal `assertPlan` (or similar) that ensures at least one `TableScan` on the MV rather than only the base tables. Without this, the test may still pass even if MV rewriting is accidentally disabled.

Suggested implementation:

```java
            // Both subqueries should be rewritten to use MVs (length() is now supported as a scalar function)
            MaterializedResult optimizedQueryResult = computeActual(queryOptimizationWithMaterializedView, baseQuery);
            MaterializedResult baseQueryResult = computeActual(baseQuery);
            assertEquals(baseQueryResult, optimizedQueryResult);

            // Assert on the plan to ensure the scalar-function subqueries are actually rewritten to use the materialized views
            assertPlan(
                    queryOptimizationWithMaterializedView,
                    baseQuery,
                    anyTree(
                            // Ensure at least one TableScan comes from the materialized view instead of the base tables
                            tableScan("<materialized_view_schema>", "<materialized_view_table_name>")));

```

1. Replace `"<materialized_view_schema>"` and `"<materialized_view_table_name>"` with the actual schema and table name of the materialized view(s) used in this test (for example, if earlier in the test you created `test_schema.mv_orders`, use those values here).
2. If not already statically imported at the top of this test file, ensure you have:
   - `import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.anyTree;`
   - `import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.tableScan;`
   - `import static com.facebook.presto.testing.assertions.PlanAssert.assertPlan;`
3. If there are two distinct materialized views used for `s1` and `s2`, you may want to strengthen the assertion by wrapping both in an `assertPlan` pattern like `anyTree(join(..., tableScan(mv1), tableScan(mv2)))` or by adding a second `tableScan` matcher, matching how other MV tests in this file assert usage.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@@ -315,6 +315,77 @@ public void testWithGroupByGroupingSetsAndOrderByOrdinals()
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add coverage for scalar functions and expressions outside the SELECT list (WHERE/HAVING/ORDER BY)

Since the visitor now rewrites IF/CASE/COALESCE/NOT/IN/IS NULL, it would be valuable to add at least one test where these scalar expressions appear in WHERE, HAVING, or ORDER BY and still get rewritten to use the MV (e.g., WHERE ABS(a) > 0, HAVING SUM(c) > 0 AND COALESCE(a, 0) > 10, or ORDER BY JSON_EXTRACT_SCALAR(a, '$.key')). This will exercise the new visitor overrides beyond projections.

Suggested implementation:

        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
    }

    @Test

    @Test
    public void testScalarFunctionInSelect()
    {
        String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
        String baseQuerySql = format("SELECT ABS(a), SUM(c) FROM %s GROUP BY a, b", BASE_TABLE_1);
        String expectedRewrittenSql = format("SELECT ABS(a), SUM(sum_c) FROM %s GROUP BY a, b", VIEW_1);
        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
    }

    @Test
    public void testScalarFunctionInWhere()
    {
        String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
        String baseQuerySql = format(
                "SELECT ABS(a), SUM(c) FROM %s WHERE ABS(a) > 0 GROUP BY a, b",
                BASE_TABLE_1);
        String expectedRewrittenSql = format(
                "SELECT ABS(a), SUM(sum_c) FROM %s WHERE ABS(a) > 0 GROUP BY a, b",
                VIEW_1);
        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
    }

    @Test
    public void testScalarExpressionInHaving()
    {
        String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
        String baseQuerySql = format(
                "SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b HAVING SUM(c) > 0 AND COALESCE(a, 0) > 10",
                BASE_TABLE_1);
        String expectedRewrittenSql = format(
                "SELECT a, b, SUM(sum_c) AS sum_c FROM %s GROUP BY a, b HAVING SUM(sum_c) > 0 AND COALESCE(a, 0) > 10",
                VIEW_1);
        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
    }

    @Test
    public void testScalarFunctionInOrderBy()
    {
        String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
        String baseQuerySql = format(
                "SELECT JSON_EXTRACT_SCALAR(a, '$.key'), SUM(c) FROM %s GROUP BY a, b ORDER BY JSON_EXTRACT_SCALAR(a, '$.key')",
                BASE_TABLE_1);
        String expectedRewrittenSql = format(
                "SELECT JSON_EXTRACT_SCALAR(a, '$.key'), SUM(sum_c) FROM %s GROUP BY a, b ORDER BY JSON_EXTRACT_SCALAR(a, '$.key')",
                VIEW_1);
        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
    }

    @Test

These edits assume:

  1. assertOptimizedQuery, BASE_TABLE_1, VIEW_1, and format are already available in the class (as in the existing tests).
  2. The base table column a is a type compatible with ABS, COALESCE, and JSON_EXTRACT_SCALAR usage in other tests in this file. If a is not JSON-typed anywhere else, you may want to adjust the JSON_EXTRACT_SCALAR expression to use whichever column is JSON in the existing test suite.
  3. If the style in the surrounding file prefers single-line SQL strings, you can inline the multi-line format calls to match that convention.

@@ -315,6 +315,77 @@ public void testWithGroupByGroupingSetsAndOrderByOrdinals()
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding a test for scalar wrapping an aggregate (e.g., ABS(SUM(c))) to cover validateExpressionForGroupBy changes

In particular, a query like SELECT ABS(SUM(c)) FROM ... GROUP BY a would exercise both the validator and the rewriter: ensuring the validator accepts a scalar over an aggregate on MV columns, and that the rewriter correctly rewrites the inner aggregate (e.g., SUM(c)SUM(sum_c)) while preserving the outer scalar wrapper.

Comment on lines +381 to +388
@Test
public void testScalarFunctionUnmappedColumn()
{
String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1);
String baseQuerySql = format("SELECT ABS(d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1);
assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a mixed mapped/unmapped scalar expression case (e.g., ABS(a + d)) to confirm correct fallback behavior

Right now we only cover a scalar function over an unmapped column, where the optimizer correctly avoids rewriting. Please also add a case where the scalar expression mixes mapped and unmapped inputs (e.g., SELECT ABS(a + d), SUM(b) ... with the MV containing a and b but not d) and assert that the optimizer still falls back to the base query rather than partially rewriting the scalar expression.

Suggested change
@Test
public void testScalarFunctionUnmappedColumn()
{
String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1);
String baseQuerySql = format("SELECT ABS(d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1);
assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testScalarFunctionUnmappedColumn()
{
String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1);
String baseQuerySql = format("SELECT ABS(d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1);
assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testScalarFunctionMixedMappedAndUnmappedColumns()
{
String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1);
String baseQuerySql = format("SELECT ABS(a + d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1);
assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
}

Comment on lines +2124 to 2127
public void testMaterializedViewOptimizationWithScalarFunctionSubquery()
{
Session queryOptimizationWithMaterializedView = Session.builder(getSession())
.setSystemProperty(QUERY_OPTIMIZATION_WITH_MATERIALIZED_VIEW_ENABLED, "true")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Reintroduce a plan-level assertion to ensure the scalar-function subqueries actually use the materialized views

This test used to assert on the plan to confirm MV usage but now only checks results. To keep verifying that length() subqueries are actually rewritten to use the MV, please add a minimal assertPlan (or similar) that ensures at least one TableScan on the MV rather than only the base tables. Without this, the test may still pass even if MV rewriting is accidentally disabled.

Suggested implementation:

            // Both subqueries should be rewritten to use MVs (length() is now supported as a scalar function)
            MaterializedResult optimizedQueryResult = computeActual(queryOptimizationWithMaterializedView, baseQuery);
            MaterializedResult baseQueryResult = computeActual(baseQuery);
            assertEquals(baseQueryResult, optimizedQueryResult);

            // Assert on the plan to ensure the scalar-function subqueries are actually rewritten to use the materialized views
            assertPlan(
                    queryOptimizationWithMaterializedView,
                    baseQuery,
                    anyTree(
                            // Ensure at least one TableScan comes from the materialized view instead of the base tables
                            tableScan("<materialized_view_schema>", "<materialized_view_table_name>")));
  1. Replace "<materialized_view_schema>" and "<materialized_view_table_name>" with the actual schema and table name of the materialized view(s) used in this test (for example, if earlier in the test you created test_schema.mv_orders, use those values here).
  2. If not already statically imported at the top of this test file, ensure you have:
    • import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.anyTree;
    • import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.tableScan;
    • import static com.facebook.presto.testing.assertions.PlanAssert.assertPlan;
  3. If there are two distinct materialized views used for s1 and s2, you may want to strengthen the assertion by wrapping both in an assertPlan pattern like anyTree(join(..., tableScan(mv1), tableScan(mv2))) or by adding a second tableScan matcher, matching how other MV tests in this file assert usage.

@ceekay47 ceekay47 requested a review from hantangwangd April 9, 2026 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants