fix(manifest): allow ManifestListWriter to reference lower-version manifests#1030
Merged
zeroshade merged 1 commit intoapache:mainfrom May 6, 2026
Merged
Conversation
Member
|
Can you rebase with main? That'll fix the failing CI |
zeroshade
approved these changes
May 6, 2026
…nifests Per the Iceberg spec, a v2 manifest list may reference v1 manifest files (and a v3 list may reference v1 or v2 manifests) so that a table can be upgraded without rewriting historical manifests. The existing exact-match gate in AddManifests rejects this, breaking every commit on a table whose metadata has been upgraded from v1 to v2 via Transaction.UpgradeFormatVersion or out-of-band by another engine. Replace the gate with file.Version() > m.version. Newer-than-writer inputs remain rejected because the v2 entry schema has no place for v3 fields such as first_row_id. Lower-than-writer inputs are accepted because the in-memory manifestFile produced for v1 inputs already carries the spec's inheritance values (Content=data, SeqNumber=MinSeqNumber=0) and can be encoded directly against the v2/v3 entry schema. The existing first_row_id assignment for data manifests with nil FirstRowIDValue covers v1 inputs in v3 lists. Adds round-trip tests for v1-in-v2 and v1+v2-in-v3, plus a regression test for the still-rejected v3-in-v2 downgrade direction.
d88b718 to
2e7b874
Compare
Merged
4 tasks
Jeffail
added a commit
to redpanda-data/connect
that referenced
this pull request
May 7, 2026
Pulls in apache/iceberg-go#1030, which fixes the upstream ManifestListWriter rejecting v1 manifest files when writing a v2 manifest list. Without this, every commit on a table that was upgraded from v1 to v2 (whether via Transaction.UpgradeFormatVersion in the committer or out-of-band by another engine) failed with "ManifestListWriter only supports version 2 manifest files" because the historical v1 manifests surfaced through existingManifests() on every commit and the v2 writer rejected them. Adds a regression test that pins the upstream behaviour: a v1 ManifestFile passed to WriteManifestList(2, ...) must round-trip and the decoded entry must inherit content=data and sequence_number=min_sequence_number=0 per the Iceberg spec.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ManifestListWriter.AddManifestsrequires every input manifest file's version to match the writer's version exactly, which contradicts the Iceberg spec: a v2 manifest list must be able to reference v1 manifest files so that v1 tables can be upgraded without rewriting historical manifests. Java'sManifestListWriterhandles this;iceberg-godoes not.The practical symptom for downstream users is that any commit to a table whose metadata has been upgraded from v1 to v2 — whether via
Transaction.UpgradeFormatVersionor out-of-band by another engine — fails withinvalid argument: ManifestListWriter only supports version 2 manifest files. The v1 manifest files written before the upgrade are never rewritten and continue to surface viasnapshotProducer.existingManifests()on every subsequent commit, so the failure is permanent.Fix
manifest.go:1410-1414: replace thefile.Version() != m.versionexact-match gate withfile.Version() > m.version. Newer-than-writer inputs are still rejected — the v2 entry schema has no place for v3 fields such asfirst_row_id, and accepting them would silently drop data. Lower-than-writer inputs are accepted because the in-memorymanifestFileproduced from a v1 input (viamanifestFileV1.toFile()orNewManifestFile(1, ...)) already carries the spec's inheritance values —Content = dataandSeqNumber = MinSeqNumber = 0— so it can be encoded directly against the v2/v3 entry schema. The existingfirst_row_idassignment for data manifests withFirstRowIDValue == nilcovers v1 inputs in v3 lists without further changes.Testing
TestV2ManifestListAcceptsV1Manifests— round-trips a v1 manifest through a v2 manifest list and asserts that the decoded entry hascontent = dataandsequence_number = min_sequence_number = 0per spec inheritance.TestV3ManifestListAcceptsV1AndV2Manifests— writes a v1 data manifest and a v2 delete manifest into a v3 list; asserts inheritance for the v1 entry, asserts thatfirst_row_idis assigned to the v1 data manifest and is left unset on the v2 delete manifest (data-only assignment per the existing v3 writer rules).TestV2ManifestListRejectsV3Manifests— confirms the downgrade direction is still blocked.TestWriteManifestListClosesWriterOnError— existing test updated to drive itsAddManifestsfailure path through a v3-in-v2 input (still rejected) instead of v1-in-v2.Confirmed failing on
mainbefore the fix and passing after.go test ./...passes.Fixes #1029