Commit e9bad0c
feat: [velox+prestissimo][iceberg] Iceberg V3 full C++ support: deletion vectors, equality deletes, sequence number conflict resolution, DV writer, DWRF data sink, Manifold filesystem, PUFFIN protocol (#27462)
Summary:
X-link: facebookincubator/velox#16959
Combined velox/prestissimo diffs for Iceberg V3 C++ support:
- Improve IcebergSplitReader error handling and fix test file handle leaks
- Add Iceberg V3 deletion vector support (DeletionVectorReader)
- Add Iceberg equality delete file reader (EqualityDeleteFileReader)
- Add sequence number conflict resolution for equality deletes
- Add sequence number conflict resolution for positional deletes and deletion vectors
- Add Iceberg V3 deletion vector writer (DeletionVectorWriter)
- Add DWRF file format support for Iceberg data sink
- Add Manifold filesystem support with CAT token authentication
- Reformat FileContent enum to multi-line for extensibility
- Wire PUFFIN file format through C++ protocol and connector layer
Thrift ODR Violation Blocking Native Parquet Writes in Velox
Problem
Velox's Parquet writer crashes with SIGSEGV when linked into any binary that also uses FBThrift (e.g., Prestissimo presto_server). The crash is a C++ One Definition Rule (ODR) violation.
Root Cause
Velox's Parquet writer depends on OSS Apache Thrift (third-party2/apache-thrift/) for serializing Parquet page headers and file metadata. FBThrift (fbcode/thrift/) is Meta's fork used by RPC services. Both libraries declare classes in the same namespace (apache::thrift::protocol::TProtocol, apache::thrift::transport::TTransport, etc.) but with incompatible class layouts:
OSS Apache Thrift (Parquet) FBThrift (Prestissimo RPC)
Namespace apache::thrift apache::thrift
TTransport size ~40 bytes (has TConfiguration shared_ptr, message size fields) ~8 bytes (vtable pointer only)
fd_ offset in TFDTransport ~40+ ~8
When both are linked into one binary, the linker picks one definition. Code compiled against the other layout reads wrong memory offsets → SIGSEGV.
Crash Signature
Signal 11 (SIGSEGV) (0x0)
std::_Sp_counted_base<>::_M_release_slow_last_use() ← null shared_ptr control block
apache::thrift::protocol::TProtocol::TProtocol() ← wrong TTransport layout
ThriftSerializer::ThriftSerializer() ← Parquet page header serialization
SerializedPageWriter::SerializedPageWriter()
Writer::close() → flush() → writeTable()
IcebergDataSink::closeInternal() ← triggered by any native Parquet write
Impact
All native Parquet writes crash (INSERT, CTAS) in any Velox binary that links FBThrift
DWRF/ORC writes are unaffected (they don't use thrift serialization)
Parquet reads are unaffected (reads use a different thrift code path that happens to not trigger the ODR)
Affects Prestissimo, and potentially any Velox embedder (Gluten/Spark, etc.) that links both Parquet and another thrift variant
Prior Art
SEV 635079 — same ODR caused SIGSEGV crashes in Spark F3 pipelines (March 2026, SEV-2)
Apache Arrow already solved this by vendoring OSS thrift in private_parquet::apache::thrift namespace (D47918122, 2023)
T262970501 — tracking task for addressing the Parquet thrift dependency
GitHub issue #13175 — upstream tracking
Solution:
X-link: facebookincubator/velox#16019 this will help fix it.
Differential Revision: D987047181 parent fc36697 commit e9bad0c
File tree
3 files changed
+37
-9
lines changed- presto-native-execution/presto_cpp
- main/connectors
- presto_protocol/connector/iceberg
3 files changed
+37
-9
lines changedLines changed: 29 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
43 | 53 | | |
44 | 54 | | |
45 | 55 | | |
| |||
171 | 181 | | |
172 | 182 | | |
173 | 183 | | |
174 | | - | |
| 184 | + | |
175 | 185 | | |
176 | 186 | | |
177 | 187 | | |
| |||
191 | 201 | | |
192 | 202 | | |
193 | 203 | | |
194 | | - | |
| 204 | + | |
195 | 205 | | |
196 | 206 | | |
197 | | - | |
| 207 | + | |
198 | 208 | | |
199 | 209 | | |
200 | | - | |
201 | | - | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
202 | 225 | | |
203 | 226 | | |
204 | 227 | | |
| |||
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
320 | 320 | | |
321 | 321 | | |
322 | 322 | | |
323 | | - | |
| 323 | + | |
| 324 | + | |
324 | 325 | | |
325 | 326 | | |
326 | 327 | | |
| |||
Lines changed: 6 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
82 | | - | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
83 | 87 | | |
84 | 88 | | |
85 | 89 | | |
86 | 90 | | |
87 | | - | |
| 91 | + | |
88 | 92 | | |
89 | 93 | | |
90 | 94 | | |
| |||
0 commit comments