Skip to content

Reject float MAP key subscripts in SubfieldExtractor instead of silently truncating #27507

@natashasehgal

Description

@natashasehgal

Your Environment

  • Presto version used: Latest trunk (presto-facebook-trunk)
  • Storage: Hive/HDFS
  • Data source and connector used: Hive connector
  • Deployment: On-prem

Expected Behavior

When a query uses a floating-point expression as a MAP subscript (e.g., m[0.99]), SubfieldExtractor should reject it with a clear error. IEEE 754 makes float equality unreliable (0.1 + 0.2 != 0.3, NaN != NaN, -0.0 == +0.0), so MAP lookups by float key are a correctness hazard.

Current Behavior

SubfieldExtractor.toSubfield() calls ((Number) value).longValue() on floating-point MAP key subscripts, which silently truncates them to integers:

  • m[0.99]LongSubscript(0)
  • m[-1.7]LongSubscript(-1)
  • m[1.5]LongSubscript(1)

The truncated subscript is sent to the Hive reader, which compensates via HiveConnectorUtil.makeFloatingPointMapKeyFilter() by expanding the truncated long back into a floating-point range filter. This means
correctness currently depends on a lossy truncation at the optimizer being precisely undone by range expansion at the reader — a fragile handoff with no documentation or contract between the two layers.

Possible Solution

Reject floating-point MAP key subscripts in SubfieldExtractor with a clear error, e.g.:

MAP subscript with floating-point key (DOUBLE/REAL) is not supported for subfield pruning.
Floating-point equality is unreliable. Use INTEGER or VARCHAR keys instead.

The Subfield model only supports 4 path element types (AllSubscripts, NestedField, StringSubscript, LongSubscript) — there is no DoubleSubscript, and adding one would require changes across the
optimizer and all readers. Rejecting is simpler and safer than the current silent truncation.

Steps to Reproduce

  1. Create a table with a MAP<DOUBLE, VARCHAR> column
  2. Query: SELECT m[0.99] FROM t
  3. Observe that SubfieldExtractor produces LongSubscript(0) for the key 0.99
  4. No warning or error is surfaced to the user

Context

We discovered this while working on subfield tracking in Axiom (a new SQL planning engine that uses Velox for execution). Axiom's optimizer hit a crash on
non-integer MAP keys because it called integerValue() unconditionally. While fixing it, we traced the Presto Java code path and found the silent truncation.

Relevant code:

  • SubfieldExtractor.java:150-160 — truncation via longValue()
  • MapSubscriptOperator.java:91 — accepts double.class

https://github.com/prestodb/presto/blob/master/presto-hive-common/src/main/java/com/facebook/presto/hive/SubfieldExtractor.java#L151

https://github.com/prestodb/presto/blob/master/presto-main-base/src/main/java/com/facebook/presto/operator/scalar/MapSubscriptOperator.java#L91

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    🆕 Unprioritized

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions