-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Reject float MAP key subscripts in SubfieldExtractor instead of silently truncating #27507
Description
Your Environment
- Presto version used: Latest trunk (presto-facebook-trunk)
- Storage: Hive/HDFS
- Data source and connector used: Hive connector
- Deployment: On-prem
Expected Behavior
When a query uses a floating-point expression as a MAP subscript (e.g., m[0.99]), SubfieldExtractor should reject it with a clear error. IEEE 754 makes float equality unreliable (0.1 + 0.2 != 0.3, NaN != NaN, -0.0 == +0.0), so MAP lookups by float key are a correctness hazard.
Current Behavior
SubfieldExtractor.toSubfield() calls ((Number) value).longValue() on floating-point MAP key subscripts, which silently truncates them to integers:
m[0.99]→LongSubscript(0)m[-1.7]→LongSubscript(-1)m[1.5]→LongSubscript(1)
The truncated subscript is sent to the Hive reader, which compensates via HiveConnectorUtil.makeFloatingPointMapKeyFilter() by expanding the truncated long back into a floating-point range filter. This means
correctness currently depends on a lossy truncation at the optimizer being precisely undone by range expansion at the reader — a fragile handoff with no documentation or contract between the two layers.
Possible Solution
Reject floating-point MAP key subscripts in SubfieldExtractor with a clear error, e.g.:
MAP subscript with floating-point key (DOUBLE/REAL) is not supported for subfield pruning.
Floating-point equality is unreliable. Use INTEGER or VARCHAR keys instead.
The Subfield model only supports 4 path element types (AllSubscripts, NestedField, StringSubscript, LongSubscript) — there is no DoubleSubscript, and adding one would require changes across the
optimizer and all readers. Rejecting is simpler and safer than the current silent truncation.
Steps to Reproduce
- Create a table with a
MAP<DOUBLE, VARCHAR>column - Query:
SELECT m[0.99] FROM t - Observe that
SubfieldExtractorproducesLongSubscript(0)for the key0.99 - No warning or error is surfaced to the user
Context
We discovered this while working on subfield tracking in Axiom (a new SQL planning engine that uses Velox for execution). Axiom's optimizer hit a crash on
non-integer MAP keys because it called integerValue() unconditionally. While fixing it, we traced the Presto Java code path and found the silent truncation.
Relevant code:
SubfieldExtractor.java:150-160— truncation vialongValue()MapSubscriptOperator.java:91— acceptsdouble.class
Metadata
Metadata
Assignees
Labels
Type
Projects
Status