fix(connector): Skip floating-point MAP key subfield extraction#27511
Open
officialasishkumar wants to merge 1 commit intoprestodb:masterfrom
Open
fix(connector): Skip floating-point MAP key subfield extraction#27511officialasishkumar wants to merge 1 commit intoprestodb:masterfrom
officialasishkumar wants to merge 1 commit intoprestodb:masterfrom
Conversation
Floating-point map keys cannot be represented as Subfield paths without lossy coercion to LongSubscript. Keep those expressions out of subfield extraction and pushdown instead of truncating them. Apply the guard in both Hive's SubfieldExtractor and the planner-side PushdownSubfields logic, including map_subset and map_filter helpers, and add regression coverage for the extractor, domain translation, optimizer extraction, and Hive logical planning. Fixes prestodb#27507 Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
|
|
Contributor
Reviewer's GuideImplements a correctness fix to subfield pushdown for MAPs keyed by DOUBLE or REAL by teaching both the planner and Hive subfield extraction logic to skip subfield extraction for floating‑point MAP keys, and adds focused unit and integration tests to verify no subfield pruning/pushdown occurs for these keys while preserving existing behavior for integer and varchar keys. Sequence diagram for Hive SubfieldExtractor with floating-point MAP keyssequenceDiagram
actor HiveSplitReader
participant SubfieldExtractor
HiveSplitReader->>SubfieldExtractor: extract(expression)
activate SubfieldExtractor
SubfieldExtractor->>SubfieldExtractor: toSubfield(expression, functionResolution, expressionOptimizer, connectorSession)
alt expression is MAP subscript with constant key
SubfieldExtractor->>SubfieldExtractor: hasFloatingPointMapKey(mapType)
alt map key type is DOUBLE or REAL
SubfieldExtractor-->>HiveSplitReader: Optional.empty
HiveSplitReader->>HiveSplitReader: read full MAP column (no subfield pruning)
else map key type is integer or varchar
SubfieldExtractor->>SubfieldExtractor: add Subfield.LongSubscript(key.longValue())
SubfieldExtractor-->>HiveSplitReader: Optional.of(subfield)
HiveSplitReader->>HiveSplitReader: push down subfield to storage
end
else expression is other supported pattern
SubfieldExtractor-->>HiveSplitReader: Optional.of(subfield or empty)
deactivate SubfieldExtractor
HiveSplitReader->>HiveSplitReader: existing pruning behavior
end
Updated class diagram for PushdownSubfields and SubfieldExtractor floating-point MAP key handlingclassDiagram
class PushdownSubfields {
+PlanOptimizerResult optimize(PlanNode plan, Session session, TypeProvider types, ...)
+static Optional~List~Subfield~~~ toSubfield(RowExpression expression, FunctionResolution functionResolution, ExpressionOptimizer expressionOptimizer, ConnectorSession connectorSession, FunctionAndTypeManager functionAndTypeManager, boolean isPushdownSubfieldsForMapFunctionsEnabled)
}
class Rewriter {
+PlanNode rewrite(PlanNode node, Context context, PlanRewriter planRewriter)
+static Optional~List~Subfield~~~ toSubfield(RowExpression expression, FunctionResolution functionResolution, ExpressionOptimizer expressionOptimizer, ConnectorSession connectorSession, FunctionAndTypeManager functionAndTypeManager, boolean isPushdownSubfieldsForMapFunctionsEnabled)
-static Optional~List~Subfield~~~ extractSubfieldsFromArray(ConstantExpression constantArray, VariableReferenceExpression mapVariable)
-static Optional~Subfield~ extractSubfieldsFromSingleValue(ConstantExpression mapKey, VariableReferenceExpression mapVariable)
-static boolean hasFloatingPointMapKey(Type type)
-static boolean isFloatingPointType(Type type)
-static NestedField nestedField(String name)
}
class SubfieldExtractor {
-RowExpression expression
-FunctionResolution functionResolution
-ExpressionOptimizer expressionOptimizer
-ConnectorSession connectorSession
+Optional~Subfield~ extract(RowExpression expression)
-static Optional~Subfield~ toSubfield(RowExpression expression, FunctionResolution functionResolution, ExpressionOptimizer expressionOptimizer, ConnectorSession connectorSession)
-static boolean hasSubscripts(Optional~Subfield~ subfield)
-static boolean hasFloatingPointMapKey(Type type)
-static boolean isFloatingPointType(Type type)
}
class Type {
}
class MapType {
+Type getKeyType()
+Type getValueType()
}
class Subfield {
class PathElement
+List~PathElement~ getPath()
class LongSubscript
}
PushdownSubfields o-- Rewriter
SubfieldExtractor ..> Subfield
Rewriter ..> Subfield
Rewriter ..> MapType
SubfieldExtractor ..> MapType
MapType --|> Type
SubfieldExtractor ..> Type
Rewriter ..> Type
Subfield ..> Subfield.PathElement
Subfield ..> Subfield.LongSubscript
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The floating-point map-key checks (
hasFloatingPointMapKey/isFloatingPointType) are now duplicated in bothPushdownSubfieldsandSubfieldExtractor; consider centralizing this logic in a shared utility (or at least aligning naming and placement) to reduce divergence risk. - The new REAL-key tests use
floatToRawIntBitsto buildREALconstants, which is a bit opaque; adding a small helper or comment explaining why the bit-cast is required would make these tests easier to maintain and less error-prone.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The floating-point map-key checks (`hasFloatingPointMapKey` / `isFloatingPointType`) are now duplicated in both `PushdownSubfields` and `SubfieldExtractor`; consider centralizing this logic in a shared utility (or at least aligning naming and placement) to reduce divergence risk.
- The new REAL-key tests use `floatToRawIntBits` to build `REAL` constants, which is a bit opaque; adding a small helper or comment explaining why the bit-cast is required would make these tests easier to maintain and less error-prone.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Skip subfield extraction for MAP subscripts whose key type is DOUBLE or REAL, because those keys cannot be represented losslessly as
Subfieldpaths.Apply the same guard in both Hive's
SubfieldExtractorand planner-sidePushdownSubfields, including themap_subsetandmap_filterhelpers.Motivation and Context
Fixes #27507.
Subfieldcan represent integer and string MAP subscripts, but floating-point MAP keys were being coerced throughlongValue(), which could incorrectly turn expressions likex[0.99]intox[0]during pruning and pushdown. The safe behavior is to leave those expressions unpushed.Impact
Correctness fix for Hive and planner subfield pushdown. Queries over MAP columns keyed by DOUBLE or REAL keep working, but those specific key accesses no longer participate in lossy subfield pruning and pushdown.
Test Plan
TestPushdownSubfieldscoverage for direct MAP subscripts andmap_subset/map_filter../mvnw -pl presto-main-base -am -Dair.check.skip-all -DskipUI -Dsurefire.failIfNoSpecifiedTests=false -Dtest=TestPushdownSubfields test../mvnw -pl presto-hive -am -Dair.check.skip-all -DskipUI -Dsurefire.failIfNoSpecifiedTests=false -Dtest=TestSubfieldExtractor,TestDomainTranslator test.TestHiveLogicalPlannercoverage for the pushdown and pruning cases; I did not execute that class locally because the available JDK 25 runtime trips a HadoopSubject.getSubjectincompatibility during Hive query-runner initialization.Contributor checklist
Release Notes
Summary by Sourcery
Prevent subfield pushdown and pruning for map accesses with floating-point keys and add regression tests across planner and Hive components.
Bug Fixes:
Tests: