feat(planner): Support GROUP BY and ORDER BY ordinals in MV query rewriting (#27422)#27422
Conversation
Reviewer's GuideAdds support in MaterializedViewQueryOptimizer for materialized view query rewriting when base queries use GROUP BY and ORDER BY ordinals, and introduces tests to validate the new behavior. Sequence diagram for MV rewrite with GROUP BY and ORDER BY ordinalssequenceDiagram
actor User
participant Planner
participant MaterializedViewQueryOptimizer
participant MaterializedViewInfo
User->>Planner: Submit query with SELECT, GROUP BY 1, ORDER BY 2
Planner->>MaterializedViewQueryOptimizer: Optimize with materialized views
MaterializedViewQueryOptimizer->>MaterializedViewInfo: getGroupBy()
MaterializedViewInfo-->>MaterializedViewQueryOptimizer: Optional<Set<Expression>> groupBy
loop For each GroupingElement
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: removeGroupingElementPrefix(element, removablePrefix)
alt groupBy present in MaterializedView
loop For each expression in element.expressions
alt expression is LongLiteral (GROUP BY ordinal)
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: Resolve ordinal to SelectItem
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: removeExpressionPrefix(selectItem.expression, removablePrefix)
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: Validate resolved expression in groupBy and baseToViewColumnMap
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: Add resolved expression to expressionsInGroupByBuilder
else expression is non ordinal
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: Validate expression in groupBy and baseToViewColumnMap
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: Add expression to expressionsInGroupByBuilder
end
end
else groupBy absent in MaterializedView
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: Add element.expressions to expressionsInGroupByBuilder
end
end
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: Rewrite ORDER BY
loop For each SortItem
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: removeSortItemPrefix(sortItem, removablePrefix)
alt sortKey is LongLiteral (ORDER BY ordinal)
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: Skip baseToViewColumnMap validation
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: visitSortItem returns original SortItem
else sortKey is non ordinal
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: Validate sortKey in baseToViewColumnMap
MaterializedViewQueryOptimizer->>MaterializedViewQueryOptimizer: visitSortItem rewrites sortKey
end
end
MaterializedViewQueryOptimizer-->>Planner: Rewritten query using materialized view
Planner-->>User: Execute against materialized view when valid
Updated class diagram for MaterializedViewQueryOptimizer ordinal handlingclassDiagram
class MaterializedViewQueryOptimizer {
}
class MaterializedViewInfo {
+Optional~Set~Expression~~ getGroupBy()
+Map~Expression, Expression~ getBaseToViewColumnMap()
}
class MaterializedViewVisitor {
-Optional~Identifier~ removablePrefix
-MaterializedViewInfo materializedViewInfo
+visitQuerySpecification(QuerySpecification node, Void context) Node
+visitOrderBy(OrderBy node, Void context) Node
+visitSortItem(SortItem node, Void context) Node
+visitSimpleGroupBy(SimpleGroupBy node, Void context) Node
}
class QuerySpecification {
+Select getSelect()
+Optional~GroupBy~ getGroupBy()
}
class GroupBy {
+List~GroupingElement~ getGroupingElements()
}
class GroupingElement {
+List~Expression~ getExpressions()
}
class OrderBy {
+List~SortItem~ getSortItems()
}
class SortItem {
+Expression getSortKey()
+SortItemOrdering getOrdering()
+SortItemNullOrdering getNullOrdering()
}
class SimpleGroupBy {
+List~Expression~ getExpressions()
}
class Select {
+List~SelectItem~ getSelectItems()
}
class SelectItem {
}
class SingleColumn {
+Expression getExpression()
}
class Expression {
}
class LongLiteral {
+long getValue()
}
class Identifier {
}
MaterializedViewQueryOptimizer --> MaterializedViewVisitor
MaterializedViewVisitor --> MaterializedViewInfo
MaterializedViewVisitor ..> QuerySpecification
MaterializedViewVisitor ..> GroupBy
MaterializedViewVisitor ..> GroupingElement
MaterializedViewVisitor ..> OrderBy
MaterializedViewVisitor ..> SortItem
MaterializedViewVisitor ..> SimpleGroupBy
MaterializedViewVisitor ..> Select
Select --> SelectItem
SelectItem <|-- SingleColumn
Expression <|-- LongLiteral
MaterializedViewVisitor ..> Expression
MaterializedViewVisitor ..> LongLiteral
MaterializedViewVisitor ..> Identifier
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- When resolving GROUP BY ordinals,
selectItems.get(ordinal - 1)can throw anIndexOutOfBoundsException; consider explicitly validating the ordinal against theselectItems.size()and throwing a clearerIllegalStateExceptionwith context about the invalid ordinal. - The new
IllegalStateExceptionfor GROUP BY ordinals referencing non-SingleColumnselect items could be made more informative by including the offending SELECT item and ordinal to aid debugging.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- When resolving GROUP BY ordinals, `selectItems.get(ordinal - 1)` can throw an `IndexOutOfBoundsException`; consider explicitly validating the ordinal against the `selectItems.size()` and throwing a clearer `IllegalStateException` with context about the invalid ordinal.
- The new `IllegalStateException` for GROUP BY ordinals referencing non-`SingleColumn` select items could be made more informative by including the offending SELECT item and ordinal to aid debugging.
## Individual Comments
### Comment 1
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java" line_range="477-478" />
<code_context>
- if (!groupByOfMaterializedView.get().contains(expression) || !materializedViewInfo.getBaseToViewColumnMap().containsKey(expression)) {
+ // Resolve ordinal references (e.g. GROUP BY 1) to the corresponding SELECT expression
+ Expression resolved = expression;
+ if (expression instanceof LongLiteral) {
+ int ordinal = toIntExact(((LongLiteral) expression).getValue());
+ SelectItem selectItem = selectItems.get(ordinal - 1);
+ if (selectItem instanceof SingleColumn) {
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against invalid or out-of-range GROUP BY ordinals for clearer failures
This treats all `LongLiteral` GROUP BY expressions as valid ordinals and directly indexes `selectItems.get(ordinal - 1)`. If `ordinal <= 0` or `ordinal > selectItems.size()`, this will throw an `IndexOutOfBoundsException` with an unhelpful message. Please add an explicit bounds check and raise a descriptive error (or reuse the engine’s standard invalid-ordinal error path) to make violations easier to diagnose.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java
Show resolved
Hide resolved
d8c39d2 to
e1613e1
Compare
…riting (prestodb#27422) Summary: Queries using GROUP BY/ORDER BY ordinals (e.g. `GROUP BY 1`) silently fell back to the base table because the MV optimizer runs before the analyzer resolves ordinals to column references. Fix by resolving ordinals to SELECT expressions during MV validation and passing them through unchanged during rewriting. ``` == RELEASE NOTES == General Changes * Add support for ``GROUP BY`` and ``ORDER BY`` ordinal references in materialized view query rewriting. Previously, queries like ``SELECT a, SUM(b) FROM t GROUP BY 1`` would silently skip materialized view optimization. ``` Differential Revision: D97920227
e1613e1 to
cf027dd
Compare
…riting (prestodb#27422) Summary: Pull Request resolved: prestodb#27422 Queries using GROUP BY/ORDER BY ordinals (e.g. `GROUP BY 1`) silently fell back to the base table because the MV optimizer runs before the analyzer resolves ordinals to column references. Fix by resolving ordinals to SELECT expressions during MV validation and passing them through unchanged during rewriting. ``` == RELEASE NOTES == General Changes * Add support for ``GROUP BY`` and ``ORDER BY`` ordinal references in materialized view query rewriting. Previously, queries like ``SELECT a, SUM(b) FROM t GROUP BY 1`` would silently skip materialized view optimization. ``` Differential Revision: D97920227
cf027dd to
6fb7955
Compare
|
Please add a release note - or |
…riting (prestodb#27422) Summary: Queries using GROUP BY/ORDER BY ordinals (e.g. `GROUP BY 1`) silently fell back to the base table because the MV optimizer runs before the analyzer resolves ordinals to column references. Fix by resolving ordinals to SELECT expressions during MV validation and passing them through unchanged during rewriting. ``` == RELEASE NOTES == General Changes * Add support for ``GROUP BY`` and ``ORDER BY`` ordinal references in materialized view query rewriting. Previously, queries like ``SELECT a, SUM(b) FROM t GROUP BY 1`` would silently skip materialized view optimization. ``` Differential Revision: D97920227
6fb7955 to
5b65cbd
Compare
…riting (prestodb#27422) Summary: Queries using GROUP BY/ORDER BY ordinals (e.g. `GROUP BY 1`) silently fell back to the base table because the MV optimizer runs before the analyzer resolves ordinals to column references. Fix by resolving ordinals to SELECT expressions during MV validation and passing them through unchanged during rewriting. ``` == RELEASE NOTES == General Changes * Add support for ``GROUP BY`` and ``ORDER BY`` ordinal references in materialized view query rewriting. Previously, queries like ``SELECT a, SUM(b) FROM t GROUP BY 1`` would silently skip materialized view optimization. ``` Differential Revision: D97920227
5b65cbd to
9c3fdc6
Compare
…riting (prestodb#27422) Summary: Pull Request resolved: prestodb#27422 Queries using GROUP BY/ORDER BY ordinals (e.g. `GROUP BY 1`) silently fell back to the base table because the MV optimizer runs before the analyzer resolves ordinals to column references. Fix by resolving ordinals to SELECT expressions during MV validation and passing them through unchanged during rewriting. ``` == RELEASE NOTES == General Changes * Add support for ``GROUP BY`` and ``ORDER BY`` ordinal references in materialized view query rewriting. Previously, queries like ``SELECT a, SUM(b) FROM t GROUP BY 1`` would silently skip materialized view optimization. ``` Differential Revision: D97920227
9c3fdc6 to
cacf429
Compare
cacf429 to
8b16dac
Compare
…writing (prestodb#27422) (prestodb#27422) Summary: Queries using GROUP BY/ORDER BY ordinals (e.g. `GROUP BY 1`) silently fell back to the base table because the MV optimizer runs before the analyzer resolves ordinals to column references. Fix by resolving ordinals to SELECT expressions during MV validation and passing them through unchanged during rewriting. ``` == RELEASE NOTES == General Changes * Add support for ``GROUP BY`` and ``ORDER BY`` ordinal references in materialized view query rewriting. Previously, queries like ``SELECT a, SUM(b) FROM t GROUP BY 1`` would silently skip materialized view optimization. ``` Differential Revision: D97920227 Pulled By: ceekay47
|
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks @ceekay47 for this feature. Overall looks good to me — just a few nits and one suggestion about test case additions.
...main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java
Outdated
Show resolved
Hide resolved
...main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java
Outdated
Show resolved
Hide resolved
...-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java
Show resolved
Hide resolved
8b16dac to
c876e1a
Compare
…writing (prestodb#27422) (prestodb#27422) Summary: Queries using GROUP BY/ORDER BY ordinals (e.g. `GROUP BY 1`) silently fell back to the base table because the MV optimizer runs before the analyzer resolves ordinals to column references. Fix by resolving ordinals to SELECT expressions during MV validation and passing them through unchanged during rewriting. ``` == RELEASE NOTES == General Changes * Add support for ``GROUP BY`` and ``ORDER BY`` ordinal references in materialized view query rewriting. Previously, queries like ``SELECT a, SUM(b) FROM t GROUP BY 1`` would silently skip materialized view optimization. ``` Differential Revision: D97920227 Pulled By: ceekay47
…writing (prestodb#27422) (prestodb#27422) (prestodb#27422) Summary: Queries using GROUP BY/ORDER BY ordinals (e.g. `GROUP BY 1`) silently fell back to the base table because the MV optimizer runs before the analyzer resolves ordinals to column references. Fix by resolving ordinals to SELECT expressions during MV validation and passing them through unchanged during rewriting. Pulled By: ceekay47 ``` == RELEASE NOTES == General Changes * Add support for ``GROUP BY`` and ``ORDER BY`` ordinal references in materialized view query rewriting. Previously, queries like ``SELECT a, SUM(b) FROM t GROUP BY 1`` would silently skip materialized view optimization. ``` ceekay47 Differential Revision: D97920227
c876e1a to
47f85a0
Compare
Summary:
Queries using GROUP BY/ORDER BY ordinals (e.g.
GROUP BY 1) silentlyfell back to the base table because the MV optimizer runs before the
analyzer resolves ordinals to column references. Fix by resolving
ordinals to SELECT expressions during MV validation and passing them
through unchanged during rewriting.
Pulled By:
ceekay47
Differential Revision: D97920227