Skip to content

Commit e9bad0c

Browse files
apurva-metafacebook-github-bot
authored andcommitted
feat: [velox+prestissimo][iceberg] Iceberg V3 full C++ support: deletion vectors, equality deletes, sequence number conflict resolution, DV writer, DWRF data sink, Manifold filesystem, PUFFIN protocol (#27462)
Summary: X-link: facebookincubator/velox#16959 Combined velox/prestissimo diffs for Iceberg V3 C++ support: - Improve IcebergSplitReader error handling and fix test file handle leaks - Add Iceberg V3 deletion vector support (DeletionVectorReader) - Add Iceberg equality delete file reader (EqualityDeleteFileReader) - Add sequence number conflict resolution for equality deletes - Add sequence number conflict resolution for positional deletes and deletion vectors - Add Iceberg V3 deletion vector writer (DeletionVectorWriter) - Add DWRF file format support for Iceberg data sink - Add Manifold filesystem support with CAT token authentication - Reformat FileContent enum to multi-line for extensibility - Wire PUFFIN file format through C++ protocol and connector layer Thrift ODR Violation Blocking Native Parquet Writes in Velox Problem Velox's Parquet writer crashes with SIGSEGV when linked into any binary that also uses FBThrift (e.g., Prestissimo presto_server). The crash is a C++ One Definition Rule (ODR) violation. Root Cause Velox's Parquet writer depends on OSS Apache Thrift (third-party2/apache-thrift/) for serializing Parquet page headers and file metadata. FBThrift (fbcode/thrift/) is Meta's fork used by RPC services. Both libraries declare classes in the same namespace (apache::thrift::protocol::TProtocol, apache::thrift::transport::TTransport, etc.) but with incompatible class layouts: OSS Apache Thrift (Parquet) FBThrift (Prestissimo RPC) Namespace apache::thrift apache::thrift TTransport size ~40 bytes (has TConfiguration shared_ptr, message size fields) ~8 bytes (vtable pointer only) fd_ offset in TFDTransport ~40+ ~8 When both are linked into one binary, the linker picks one definition. Code compiled against the other layout reads wrong memory offsets → SIGSEGV. Crash Signature Signal 11 (SIGSEGV) (0x0) std::_Sp_counted_base<>::_M_release_slow_last_use() ← null shared_ptr control block apache::thrift::protocol::TProtocol::TProtocol() ← wrong TTransport layout ThriftSerializer::ThriftSerializer() ← Parquet page header serialization SerializedPageWriter::SerializedPageWriter() Writer::close() → flush() → writeTable() IcebergDataSink::closeInternal() ← triggered by any native Parquet write Impact All native Parquet writes crash (INSERT, CTAS) in any Velox binary that links FBThrift DWRF/ORC writes are unaffected (they don't use thrift serialization) Parquet reads are unaffected (reads use a different thrift code path that happens to not trigger the ODR) Affects Prestissimo, and potentially any Velox embedder (Gluten/Spark, etc.) that links both Parquet and another thrift variant Prior Art SEV 635079 — same ODR caused SIGSEGV crashes in Spark F3 pipelines (March 2026, SEV-2) Apache Arrow already solved this by vendoring OSS thrift in private_parquet::apache::thrift namespace (D47918122, 2023) T262970501 — tracking task for addressing the Parquet thrift dependency GitHub issue #13175 — upstream tracking Solution: X-link: facebookincubator/velox#16019 this will help fix it. Differential Revision: D98704718
1 parent fc36697 commit e9bad0c

File tree

3 files changed

+37
-9
lines changed

3 files changed

+37
-9
lines changed

presto-native-execution/presto_cpp/main/connectors/IcebergPrestoToVeloxConnector.cpp

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,19 @@ velox::connector::hive::iceberg::FileContent toVeloxFileContent(
3737
velox::dwio::common::FileFormat toVeloxFileFormat(
3838
const presto::protocol::iceberg::FileFormat format) {
3939
if (format == protocol::iceberg::FileFormat::ORC) {
40-
return velox::dwio::common::FileFormat::ORC;
40+
// Iceberg metadata stores DWRF as ORC since Iceberg has no native DWRF
41+
// format. At Meta, all Iceberg ORC files are actually DWRF-encoded.
42+
return velox::dwio::common::FileFormat::DWRF;
4143
} else if (format == protocol::iceberg::FileFormat::PARQUET) {
4244
return velox::dwio::common::FileFormat::PARQUET;
45+
} else if (format == protocol::iceberg::FileFormat::PUFFIN) {
46+
// PUFFIN is used for Iceberg V3 deletion vectors. The DeletionVectorReader
47+
// reads raw binary from the file and does not use the DWRF/Parquet reader,
48+
// so we map PUFFIN to DWRF as a placeholder — the format value is not
49+
// actually used by the reader. This mapping is only safe for deletion
50+
// vector files; if PUFFIN is encountered for other file content types,
51+
// the DV routing logic in toHiveIcebergSplit() must reclassify it first.
52+
return velox::dwio::common::FileFormat::DWRF;
4353
}
4454
VELOX_UNSUPPORTED("Unsupported file format: {}", fmt::underlying(format));
4555
}
@@ -171,7 +181,7 @@ IcebergPrestoToVeloxConnector::toVeloxSplit(
171181
const protocol::ConnectorId& catalogId,
172182
const protocol::ConnectorSplit* connectorSplit,
173183
const protocol::SplitContext* splitContext) const {
174-
auto icebergSplit =
184+
const auto* icebergSplit =
175185
dynamic_cast<const protocol::iceberg::IcebergSplit*>(connectorSplit);
176186
VELOX_CHECK_NOT_NULL(
177187
icebergSplit, "Unexpected split type {}", connectorSplit->_type);
@@ -191,14 +201,27 @@ IcebergPrestoToVeloxConnector::toVeloxSplit(
191201
std::vector<velox::connector::hive::iceberg::IcebergDeleteFile> deletes;
192202
deletes.reserve(icebergSplit->deletes.size());
193203
for (const auto& deleteFile : icebergSplit->deletes) {
194-
std::unordered_map<int32_t, std::string> lowerBounds(
204+
const std::unordered_map<int32_t, std::string> lowerBounds(
195205
deleteFile.lowerBounds.begin(), deleteFile.lowerBounds.end());
196206

197-
std::unordered_map<int32_t, std::string> upperBounds(
207+
const std::unordered_map<int32_t, std::string> upperBounds(
198208
deleteFile.upperBounds.begin(), deleteFile.upperBounds.end());
199209

200-
velox::connector::hive::iceberg::IcebergDeleteFile icebergDeleteFile(
201-
toVeloxFileContent(deleteFile.content),
210+
// Iceberg V3 deletion vectors arrive from the coordinator as
211+
// POSITION_DELETES with PUFFIN format. Reclassify them as
212+
// kDeletionVector so that IcebergSplitReader routes them to
213+
// DeletionVectorReader instead of PositionalDeleteFileReader.
214+
velox::connector::hive::iceberg::FileContent veloxContent =
215+
toVeloxFileContent(deleteFile.content);
216+
if (veloxContent ==
217+
velox::connector::hive::iceberg::FileContent::kPositionalDeletes &&
218+
deleteFile.format == protocol::iceberg::FileFormat::PUFFIN) {
219+
veloxContent =
220+
velox::connector::hive::iceberg::FileContent::kDeletionVector;
221+
}
222+
223+
const velox::connector::hive::iceberg::IcebergDeleteFile icebergDeleteFile(
224+
veloxContent,
202225
deleteFile.path,
203226
toVeloxFileFormat(deleteFile.format),
204227
deleteFile.recordCount,

presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -320,7 +320,8 @@ static const std::pair<FileFormat, json> FileFormat_enum_table[] =
320320
{FileFormat::ORC, "ORC"},
321321
{FileFormat::PARQUET, "PARQUET"},
322322
{FileFormat::AVRO, "AVRO"},
323-
{FileFormat::METADATA, "METADATA"}};
323+
{FileFormat::METADATA, "METADATA"},
324+
{FileFormat::PUFFIN, "PUFFIN"}};
324325
void to_json(json& j, const FileFormat& e) {
325326
static_assert(std::is_enum<FileFormat>::value, "FileFormat must be an enum!");
326327
const auto* it = std::find_if(

presto-native-execution/presto_cpp/presto_protocol/connector/iceberg/presto_protocol_iceberg.h

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,12 +79,16 @@ void to_json(json& j, const ChangelogSplitInfo& p);
7979
void from_json(const json& j, ChangelogSplitInfo& p);
8080
} // namespace facebook::presto::protocol::iceberg
8181
namespace facebook::presto::protocol::iceberg {
82-
enum class FileContent { DATA, POSITION_DELETES, EQUALITY_DELETES };
82+
enum class FileContent {
83+
DATA,
84+
POSITION_DELETES,
85+
EQUALITY_DELETES,
86+
};
8387
extern void to_json(json& j, const FileContent& e);
8488
extern void from_json(const json& j, FileContent& e);
8589
} // namespace facebook::presto::protocol::iceberg
8690
namespace facebook::presto::protocol::iceberg {
87-
enum class FileFormat { ORC, PARQUET, AVRO, METADATA };
91+
enum class FileFormat { ORC, PARQUET, AVRO, METADATA, PUFFIN };
8892
extern void to_json(json& j, const FileFormat& e);
8993
extern void from_json(const json& j, FileFormat& e);
9094
} // namespace facebook::presto::protocol::iceberg

0 commit comments

Comments
 (0)