feat(planner): Support scalar functions in MV query rewriting#27549
feat(planner): Support scalar functions in MV query rewriting#27549ceekay47 wants to merge 1 commit intoprestodb:masterfrom
Conversation
Summary: The MV query rewriter previously rejected all non-aggregate function calls. This change treats any function not in ASSOCIATIVE_REWRITE_FUNCTIONS or NON_ASSOCIATIVE_REWRITE_FUNCTIONS as a scalar function and recursively rewrites its arguments. This supports all scalar functions (CONCAT, ABS, JSON_EXTRACT, JSON_EXTRACT_SCALAR, LOWER, CAST, IF, COALESCE, CASE, etc.) without maintaining an allowlist. Changes: - visitFunctionCall: rewrite arguments recursively for non-aggregate functions - visitExpression: re-dispatch after prefix stripping so DereferenceExpression nodes get mapped correctly - Add visitor overrides for IfExpression, CoalesceExpression, SearchedCaseExpression, SimpleCaseExpression, WhenClause, NullIfExpression, Cast, InPredicate, NotExpression, IsNullPredicate, IsNotNullPredicate - validateExpressionForGroupBy: allow scalar expressions over GROUP BY columns - Remove SUPPORTED_FUNCTION_CALLS gate from validator ``` == RELEASE NOTES == General Changes * Add support for scalar functions in materialized view query rewriting. Queries using functions like ``CONCAT``, ``ABS``, ``JSON_EXTRACT``, ``CAST``, ``IF``, ``COALESCE``, and ``CASE`` expressions now correctly rewrite to scan the materialized view. ``` Differential Revision: D99652975
Reviewer's GuideExtends the materialized view (MV) query optimizer to treat all non-aggregate, non-associative functions as scalar expressions whose arguments are recursively rewritten, and relaxes the query-shape validator so scalar functions and conditionals in SELECT/GROUP BY contexts are supported, with corresponding tests added and updated. Class diagram for updated MV query optimizer visitorsclassDiagram
class MaterializedViewQueryOptimizer {
- Map baseToViewColumnMap
- Set ASSOCIATIVE_REWRITE_FUNCTIONS
+ validateExpressionForGroupBy(groupByOfMaterializedView ~Set<Expression>~, expression Expression) boolean
}
class MaterializedViewQueryOptimizer_RewritingVisitor {
+ visitExpression(node Expression, context Void) Node
+ visitFunctionCall(node FunctionCall, context Void) Node
+ visitComparisonExpression(node ComparisonExpression, context Void) Node
+ visitIfExpression(node IfExpression, context Void) Node
+ visitCoalesceExpression(node CoalesceExpression, context Void) Node
+ visitSearchedCaseExpression(node SearchedCaseExpression, context Void) Node
+ visitSimpleCaseExpression(node SimpleCaseExpression, context Void) Node
+ visitWhenClause(node WhenClause, context Void) Node
+ visitNullIfExpression(node NullIfExpression, context Void) Node
+ visitCast(node Cast, context Void) Node
+ visitInPredicate(node InPredicate, context Void) Node
+ visitNotExpression(node NotExpression, context Void) Node
+ visitIsNullPredicate(node IsNullPredicate, context Void) Node
+ visitIsNotNullPredicate(node IsNotNullPredicate, context Void) Node
}
class MaterializedViewRewriteQueryShapeValidator {
+ validateMaterializedViewOptimizationQueryShape(querySpecification QuerySpecification) Optional
}
class MaterializedViewRewriteQueryShapeValidator_Visitor {
- Optional errorMessage
+ validateMaterializedViewOptimizationQueryShape(querySpecification QuerySpecification) Optional
+ visitGroupBy(node GroupBy, context Void) Void
+ visitFunctionCall(node FunctionCall, context Void) Void
}
MaterializedViewQueryOptimizer "1" *-- "1" MaterializedViewQueryOptimizer_RewritingVisitor : uses
MaterializedViewRewriteQueryShapeValidator "1" *-- "1" MaterializedViewRewriteQueryShapeValidator_Visitor : uses
Expression <|-- IfExpression
Expression <|-- CoalesceExpression
Expression <|-- SearchedCaseExpression
Expression <|-- SimpleCaseExpression
Expression <|-- NullIfExpression
Expression <|-- Cast
Expression <|-- InPredicate
Expression <|-- NotExpression
Expression <|-- IsNullPredicate
Expression <|-- FunctionCall
Expression <|-- ComparisonExpression
SearchedCaseExpression "*" o-- "*" WhenClause
SimpleCaseExpression "*" o-- "*" WhenClause
Flow diagram for visitFunctionCall rewrite logicflowchart TD
A["visitFunctionCall(node)"] --> B["Check baseToViewColumnMap.containsKey(node)"]
B -->|true| C["Return mapped Expression from baseToViewColumnMap"]
B -->|false| D["Check ASSOCIATIVE_REWRITE_FUNCTIONS.contains(node.name)"]
D -->|true| E["Handle associative rewrite (existing logic)"]
D -->|false| F["Treat as scalar function"]
F --> G["For each argument in node.arguments call process(argument, context)"]
G --> H["Build rewrittenArguments list"]
H --> I["Return new FunctionCall with rewrittenArguments and original properties"]
Flow diagram for validateExpressionForGroupBy with scalar supportflowchart TD
A["validateExpressionForGroupBy(groupByOfMaterializedView, expression)"] --> B["Compute canRewrite for associative/non-associative functions"]
B --> C["Compute isScalar:\nexpression is not Identifier\nand not associative FunctionCall"]
C --> D["Check groupByOfMaterializedView.contains(expression)"]
D -->|true| E["Return true (valid)"]
D -->|false| F["Check materializedViewInfo.baseToViewColumnMap.containsKey(expression)"]
F --> G["Result = groupByOfMaterializedView.contains(expression) || !(inBaseToViewMap || canRewrite || isScalar)"]
G --> H["Return Result"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- The new
isScalarcheck invalidateExpressionForGroupByis purely syntactic and will treat any non-ASSOCIATIVEFunctionCallas scalar; consider using function metadata (e.g., aggregation/window vs scalar kind) so that aggregate or window functions that aren’t inASSOCIATIVE_REWRITE_FUNCTIONSdon’t accidentally get classified as scalar and allowed through validation. - The set of explicit visitor overrides for scalar expressions (
IfExpression,CoalesceExpression,Case,NullIfExpression, predicates, etc.) all follow the same pattern of recursively processing children; you may want to centralize this behavior via a common expression visitor or helper to reduce duplication and make it easier to keep new scalar node types consistent in the future.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `isScalar` check in `validateExpressionForGroupBy` is purely syntactic and will treat any non-ASSOCIATIVE `FunctionCall` as scalar; consider using function metadata (e.g., aggregation/window vs scalar kind) so that aggregate or window functions that aren’t in `ASSOCIATIVE_REWRITE_FUNCTIONS` don’t accidentally get classified as scalar and allowed through validation.
- The set of explicit visitor overrides for scalar expressions (`IfExpression`, `CoalesceExpression`, `Case`, `NullIfExpression`, predicates, etc.) all follow the same pattern of recursively processing children; you may want to centralize this behavior via a common expression visitor or helper to reduce duplication and make it easier to keep new scalar node types consistent in the future.
## Individual Comments
### Comment 1
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java" line_range="319-315" />
<code_context>
}
+ @Test
+ public void testScalarFunctionInSelect()
+ {
+ String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
+ String baseQuerySql = format("SELECT ABS(a), SUM(c) FROM %s GROUP BY a, b", BASE_TABLE_1);
+ String expectedRewrittenSql = format("SELECT ABS(a), SUM(sum_c) FROM %s GROUP BY a, b", VIEW_1);
+ assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
+ }
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add coverage for scalar functions and expressions outside the SELECT list (WHERE/HAVING/ORDER BY)
Since the visitor now rewrites IF/CASE/COALESCE/NOT/IN/IS NULL, it would be valuable to add at least one test where these scalar expressions appear in WHERE, HAVING, or ORDER BY and still get rewritten to use the MV (e.g., `WHERE ABS(a) > 0`, `HAVING SUM(c) > 0 AND COALESCE(a, 0) > 10`, or `ORDER BY JSON_EXTRACT_SCALAR(a, '$.key')`). This will exercise the new visitor overrides beyond projections.
Suggested implementation:
```java
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
@Test
public void testScalarFunctionInSelect()
{
String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
String baseQuerySql = format("SELECT ABS(a), SUM(c) FROM %s GROUP BY a, b", BASE_TABLE_1);
String expectedRewrittenSql = format("SELECT ABS(a), SUM(sum_c) FROM %s GROUP BY a, b", VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testScalarFunctionInWhere()
{
String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
String baseQuerySql = format(
"SELECT ABS(a), SUM(c) FROM %s WHERE ABS(a) > 0 GROUP BY a, b",
BASE_TABLE_1);
String expectedRewrittenSql = format(
"SELECT ABS(a), SUM(sum_c) FROM %s WHERE ABS(a) > 0 GROUP BY a, b",
VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testScalarExpressionInHaving()
{
String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
String baseQuerySql = format(
"SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b HAVING SUM(c) > 0 AND COALESCE(a, 0) > 10",
BASE_TABLE_1);
String expectedRewrittenSql = format(
"SELECT a, b, SUM(sum_c) AS sum_c FROM %s GROUP BY a, b HAVING SUM(sum_c) > 0 AND COALESCE(a, 0) > 10",
VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testScalarFunctionInOrderBy()
{
String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
String baseQuerySql = format(
"SELECT JSON_EXTRACT_SCALAR(a, '$.key'), SUM(c) FROM %s GROUP BY a, b ORDER BY JSON_EXTRACT_SCALAR(a, '$.key')",
BASE_TABLE_1);
String expectedRewrittenSql = format(
"SELECT JSON_EXTRACT_SCALAR(a, '$.key'), SUM(sum_c) FROM %s GROUP BY a, b ORDER BY JSON_EXTRACT_SCALAR(a, '$.key')",
VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
```
These edits assume:
1. `assertOptimizedQuery`, `BASE_TABLE_1`, `VIEW_1`, and `format` are already available in the class (as in the existing tests).
2. The base table column `a` is a type compatible with `ABS`, `COALESCE`, and `JSON_EXTRACT_SCALAR` usage in other tests in this file. If `a` is not JSON-typed anywhere else, you may want to adjust the `JSON_EXTRACT_SCALAR` expression to use whichever column is JSON in the existing test suite.
3. If the style in the surrounding file prefers single-line SQL strings, you can inline the multi-line `format` calls to match that convention.
</issue_to_address>
### Comment 2
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java" line_range="373-315" />
<code_context>
+ }
+
+ @Test
+ public void testNestedScalarFunctionsInSelect()
+ {
+ String originalViewSql = format("SELECT a as mv_a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
+ String baseQuerySql = format("SELECT ABS(a + b), SUM(c) FROM %s GROUP BY a, b", BASE_TABLE_1);
+ String expectedRewrittenSql = format("SELECT ABS(mv_a + b), SUM(sum_c) FROM %s GROUP BY mv_a, b", VIEW_1);
+ assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
+ }
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for scalar wrapping an aggregate (e.g., ABS(SUM(c))) to cover validateExpressionForGroupBy changes
In particular, a query like `SELECT ABS(SUM(c)) FROM ... GROUP BY a` would exercise both the validator and the rewriter: ensuring the validator accepts a scalar over an aggregate on MV columns, and that the rewriter correctly rewrites the inner aggregate (e.g., `SUM(c)` → `SUM(sum_c)`) while preserving the outer scalar wrapper.
</issue_to_address>
### Comment 3
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java" line_range="381-388" />
<code_context>
+ }
+
+ @Test
+ public void testScalarFunctionUnmappedColumn()
+ {
+ String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1);
+ String baseQuerySql = format("SELECT ABS(d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1);
+ assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
+ }
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a mixed mapped/unmapped scalar expression case (e.g., ABS(a + d)) to confirm correct fallback behavior
Right now we only cover a scalar function over an unmapped column, where the optimizer correctly avoids rewriting. Please also add a case where the scalar expression mixes mapped and unmapped inputs (e.g., `SELECT ABS(a + d), SUM(b) ...` with the MV containing `a` and `b` but not `d`) and assert that the optimizer still falls back to the base query rather than partially rewriting the scalar expression.
```suggestion
@Test
public void testScalarFunctionUnmappedColumn()
{
String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1);
String baseQuerySql = format("SELECT ABS(d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1);
assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testScalarFunctionMixedMappedAndUnmappedColumns()
{
String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1);
String baseQuerySql = format("SELECT ABS(a + d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1);
assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
```
</issue_to_address>
### Comment 4
<location path="presto-hive/src/test/java/com/facebook/presto/hive/TestHiveMaterializedViewLogicalPlanner.java" line_range="2124-2127" />
<code_context>
@Test
- public void testMaterializedViewOptimizationWithUnsupportedFunctionSubquery()
+ public void testMaterializedViewOptimizationWithScalarFunctionSubquery()
{
Session queryOptimizationWithMaterializedView = Session.builder(getSession())
</code_context>
<issue_to_address>
**suggestion (testing):** Reintroduce a plan-level assertion to ensure the scalar-function subqueries actually use the materialized views
This test used to assert on the plan to confirm MV usage but now only checks results. To keep verifying that `length()` subqueries are actually rewritten to use the MV, please add a minimal `assertPlan` (or similar) that ensures at least one `TableScan` on the MV rather than only the base tables. Without this, the test may still pass even if MV rewriting is accidentally disabled.
Suggested implementation:
```java
// Both subqueries should be rewritten to use MVs (length() is now supported as a scalar function)
MaterializedResult optimizedQueryResult = computeActual(queryOptimizationWithMaterializedView, baseQuery);
MaterializedResult baseQueryResult = computeActual(baseQuery);
assertEquals(baseQueryResult, optimizedQueryResult);
// Assert on the plan to ensure the scalar-function subqueries are actually rewritten to use the materialized views
assertPlan(
queryOptimizationWithMaterializedView,
baseQuery,
anyTree(
// Ensure at least one TableScan comes from the materialized view instead of the base tables
tableScan("<materialized_view_schema>", "<materialized_view_table_name>")));
```
1. Replace `"<materialized_view_schema>"` and `"<materialized_view_table_name>"` with the actual schema and table name of the materialized view(s) used in this test (for example, if earlier in the test you created `test_schema.mv_orders`, use those values here).
2. If not already statically imported at the top of this test file, ensure you have:
- `import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.anyTree;`
- `import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.tableScan;`
- `import static com.facebook.presto.testing.assertions.PlanAssert.assertPlan;`
3. If there are two distinct materialized views used for `s1` and `s2`, you may want to strengthen the assertion by wrapping both in an `assertPlan` pattern like `anyTree(join(..., tableScan(mv1), tableScan(mv2)))` or by adding a second `tableScan` matcher, matching how other MV tests in this file assert usage.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @@ -315,6 +315,77 @@ public void testWithGroupByGroupingSetsAndOrderByOrdinals() | |||
| assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1); | |||
There was a problem hiding this comment.
suggestion (testing): Add coverage for scalar functions and expressions outside the SELECT list (WHERE/HAVING/ORDER BY)
Since the visitor now rewrites IF/CASE/COALESCE/NOT/IN/IS NULL, it would be valuable to add at least one test where these scalar expressions appear in WHERE, HAVING, or ORDER BY and still get rewritten to use the MV (e.g., WHERE ABS(a) > 0, HAVING SUM(c) > 0 AND COALESCE(a, 0) > 10, or ORDER BY JSON_EXTRACT_SCALAR(a, '$.key')). This will exercise the new visitor overrides beyond projections.
Suggested implementation:
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
@Test
public void testScalarFunctionInSelect()
{
String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
String baseQuerySql = format("SELECT ABS(a), SUM(c) FROM %s GROUP BY a, b", BASE_TABLE_1);
String expectedRewrittenSql = format("SELECT ABS(a), SUM(sum_c) FROM %s GROUP BY a, b", VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testScalarFunctionInWhere()
{
String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
String baseQuerySql = format(
"SELECT ABS(a), SUM(c) FROM %s WHERE ABS(a) > 0 GROUP BY a, b",
BASE_TABLE_1);
String expectedRewrittenSql = format(
"SELECT ABS(a), SUM(sum_c) FROM %s WHERE ABS(a) > 0 GROUP BY a, b",
VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testScalarExpressionInHaving()
{
String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
String baseQuerySql = format(
"SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b HAVING SUM(c) > 0 AND COALESCE(a, 0) > 10",
BASE_TABLE_1);
String expectedRewrittenSql = format(
"SELECT a, b, SUM(sum_c) AS sum_c FROM %s GROUP BY a, b HAVING SUM(sum_c) > 0 AND COALESCE(a, 0) > 10",
VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testScalarFunctionInOrderBy()
{
String originalViewSql = format("SELECT a, b, SUM(c) AS sum_c FROM %s GROUP BY a, b", BASE_TABLE_1);
String baseQuerySql = format(
"SELECT JSON_EXTRACT_SCALAR(a, '$.key'), SUM(c) FROM %s GROUP BY a, b ORDER BY JSON_EXTRACT_SCALAR(a, '$.key')",
BASE_TABLE_1);
String expectedRewrittenSql = format(
"SELECT JSON_EXTRACT_SCALAR(a, '$.key'), SUM(sum_c) FROM %s GROUP BY a, b ORDER BY JSON_EXTRACT_SCALAR(a, '$.key')",
VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@TestThese edits assume:
assertOptimizedQuery,BASE_TABLE_1,VIEW_1, andformatare already available in the class (as in the existing tests).- The base table column
ais a type compatible withABS,COALESCE, andJSON_EXTRACT_SCALARusage in other tests in this file. Ifais not JSON-typed anywhere else, you may want to adjust theJSON_EXTRACT_SCALARexpression to use whichever column is JSON in the existing test suite. - If the style in the surrounding file prefers single-line SQL strings, you can inline the multi-line
formatcalls to match that convention.
| @@ -315,6 +315,77 @@ public void testWithGroupByGroupingSetsAndOrderByOrdinals() | |||
| assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1); | |||
There was a problem hiding this comment.
suggestion (testing): Consider adding a test for scalar wrapping an aggregate (e.g., ABS(SUM(c))) to cover validateExpressionForGroupBy changes
In particular, a query like SELECT ABS(SUM(c)) FROM ... GROUP BY a would exercise both the validator and the rewriter: ensuring the validator accepts a scalar over an aggregate on MV columns, and that the rewriter correctly rewrites the inner aggregate (e.g., SUM(c) → SUM(sum_c)) while preserving the outer scalar wrapper.
| @Test | ||
| public void testScalarFunctionUnmappedColumn() | ||
| { | ||
| String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1); | ||
| String baseQuerySql = format("SELECT ABS(d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1); | ||
| assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1); | ||
| } | ||
|
|
There was a problem hiding this comment.
suggestion (testing): Add a mixed mapped/unmapped scalar expression case (e.g., ABS(a + d)) to confirm correct fallback behavior
Right now we only cover a scalar function over an unmapped column, where the optimizer correctly avoids rewriting. Please also add a case where the scalar expression mixes mapped and unmapped inputs (e.g., SELECT ABS(a + d), SUM(b) ... with the MV containing a and b but not d) and assert that the optimizer still falls back to the base query rather than partially rewriting the scalar expression.
| @Test | |
| public void testScalarFunctionUnmappedColumn() | |
| { | |
| String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1); | |
| String baseQuerySql = format("SELECT ABS(d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1); | |
| assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1); | |
| } | |
| @Test | |
| public void testScalarFunctionUnmappedColumn() | |
| { | |
| String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1); | |
| String baseQuerySql = format("SELECT ABS(d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1); | |
| assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1); | |
| } | |
| @Test | |
| public void testScalarFunctionMixedMappedAndUnmappedColumns() | |
| { | |
| String originalViewSql = format("SELECT a, SUM(b) AS sum_b FROM %s GROUP BY a", BASE_TABLE_1); | |
| String baseQuerySql = format("SELECT ABS(a + d), SUM(b) FROM %s GROUP BY a", BASE_TABLE_1); | |
| assertOptimizedQuery(baseQuerySql, baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1); | |
| } | |
| public void testMaterializedViewOptimizationWithScalarFunctionSubquery() | ||
| { | ||
| Session queryOptimizationWithMaterializedView = Session.builder(getSession()) | ||
| .setSystemProperty(QUERY_OPTIMIZATION_WITH_MATERIALIZED_VIEW_ENABLED, "true") |
There was a problem hiding this comment.
suggestion (testing): Reintroduce a plan-level assertion to ensure the scalar-function subqueries actually use the materialized views
This test used to assert on the plan to confirm MV usage but now only checks results. To keep verifying that length() subqueries are actually rewritten to use the MV, please add a minimal assertPlan (or similar) that ensures at least one TableScan on the MV rather than only the base tables. Without this, the test may still pass even if MV rewriting is accidentally disabled.
Suggested implementation:
// Both subqueries should be rewritten to use MVs (length() is now supported as a scalar function)
MaterializedResult optimizedQueryResult = computeActual(queryOptimizationWithMaterializedView, baseQuery);
MaterializedResult baseQueryResult = computeActual(baseQuery);
assertEquals(baseQueryResult, optimizedQueryResult);
// Assert on the plan to ensure the scalar-function subqueries are actually rewritten to use the materialized views
assertPlan(
queryOptimizationWithMaterializedView,
baseQuery,
anyTree(
// Ensure at least one TableScan comes from the materialized view instead of the base tables
tableScan("<materialized_view_schema>", "<materialized_view_table_name>")));- Replace
"<materialized_view_schema>"and"<materialized_view_table_name>"with the actual schema and table name of the materialized view(s) used in this test (for example, if earlier in the test you createdtest_schema.mv_orders, use those values here). - If not already statically imported at the top of this test file, ensure you have:
import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.anyTree;import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.tableScan;import static com.facebook.presto.testing.assertions.PlanAssert.assertPlan;
- If there are two distinct materialized views used for
s1ands2, you may want to strengthen the assertion by wrapping both in anassertPlanpattern likeanyTree(join(..., tableScan(mv1), tableScan(mv2)))or by adding a secondtableScanmatcher, matching how other MV tests in this file assert usage.
Summary:
The MV query rewriter previously rejected all non-aggregate
function calls. This change treats any function not in
ASSOCIATIVE_REWRITE_FUNCTIONS or NON_ASSOCIATIVE_REWRITE_FUNCTIONS as a
scalar function and recursively rewrites its arguments. This supports all
scalar functions (CONCAT, ABS, JSON_EXTRACT, JSON_EXTRACT_SCALAR, LOWER,
CAST, IF, COALESCE, CASE, etc.) without maintaining an allowlist.
Changes:
nodes get mapped correctly
SearchedCaseExpression, SimpleCaseExpression, WhenClause,
NullIfExpression, Cast, InPredicate, NotExpression, IsNullPredicate,
IsNotNullPredicate
Differential Revision: D99652975
Summary by Sourcery
Enable scalar functions and expressions in materialized view query rewriting and validation so more queries can be optimized to read from materialized views.
New Features:
Enhancements:
Tests: