Skip to content

fix(manifest): allow ManifestListWriter to reference lower-version manifests#1030

Merged
zeroshade merged 1 commit intoapache:mainfrom
Jeffail:fix-manifest-list-mixed-versions
May 6, 2026
Merged

fix(manifest): allow ManifestListWriter to reference lower-version manifests#1030
zeroshade merged 1 commit intoapache:mainfrom
Jeffail:fix-manifest-list-mixed-versions

Conversation

@Jeffail
Copy link
Copy Markdown
Contributor

@Jeffail Jeffail commented May 6, 2026

Problem

ManifestListWriter.AddManifests requires every input manifest file's version to match the writer's version exactly, which contradicts the Iceberg spec: a v2 manifest list must be able to reference v1 manifest files so that v1 tables can be upgraded without rewriting historical manifests. Java's ManifestListWriter handles this; iceberg-go does not.

The practical symptom for downstream users is that any commit to a table whose metadata has been upgraded from v1 to v2 — whether via Transaction.UpgradeFormatVersion or out-of-band by another engine — fails with invalid argument: ManifestListWriter only supports version 2 manifest files. The v1 manifest files written before the upgrade are never rewritten and continue to surface via snapshotProducer.existingManifests() on every subsequent commit, so the failure is permanent.

Fix

manifest.go:1410-1414: replace the file.Version() != m.version exact-match gate with file.Version() > m.version. Newer-than-writer inputs are still rejected — the v2 entry schema has no place for v3 fields such as first_row_id, and accepting them would silently drop data. Lower-than-writer inputs are accepted because the in-memory manifestFile produced from a v1 input (via manifestFileV1.toFile() or NewManifestFile(1, ...)) already carries the spec's inheritance values — Content = data and SeqNumber = MinSeqNumber = 0 — so it can be encoded directly against the v2/v3 entry schema. The existing first_row_id assignment for data manifests with FirstRowIDValue == nil covers v1 inputs in v3 lists without further changes.

Testing

  • TestV2ManifestListAcceptsV1Manifests — round-trips a v1 manifest through a v2 manifest list and asserts that the decoded entry has content = data and sequence_number = min_sequence_number = 0 per spec inheritance.
  • TestV3ManifestListAcceptsV1AndV2Manifests — writes a v1 data manifest and a v2 delete manifest into a v3 list; asserts inheritance for the v1 entry, asserts that first_row_id is assigned to the v1 data manifest and is left unset on the v2 delete manifest (data-only assignment per the existing v3 writer rules).
  • TestV2ManifestListRejectsV3Manifests — confirms the downgrade direction is still blocked.
  • TestWriteManifestListClosesWriterOnError — existing test updated to drive its AddManifests failure path through a v3-in-v2 input (still rejected) instead of v1-in-v2.

Confirmed failing on main before the fix and passing after. go test ./... passes.

Fixes #1029

@Jeffail Jeffail requested a review from zeroshade as a code owner May 6, 2026 14:33
@zeroshade
Copy link
Copy Markdown
Member

Can you rebase with main? That'll fix the failing CI

…nifests

Per the Iceberg spec, a v2 manifest list may reference v1 manifest files
(and a v3 list may reference v1 or v2 manifests) so that a table can be
upgraded without rewriting historical manifests. The existing exact-match
gate in AddManifests rejects this, breaking every commit on a table whose
metadata has been upgraded from v1 to v2 via Transaction.UpgradeFormatVersion
or out-of-band by another engine.

Replace the gate with file.Version() > m.version. Newer-than-writer inputs
remain rejected because the v2 entry schema has no place for v3 fields such
as first_row_id. Lower-than-writer inputs are accepted because the in-memory
manifestFile produced for v1 inputs already carries the spec's inheritance
values (Content=data, SeqNumber=MinSeqNumber=0) and can be encoded directly
against the v2/v3 entry schema. The existing first_row_id assignment for
data manifests with nil FirstRowIDValue covers v1 inputs in v3 lists.

Adds round-trip tests for v1-in-v2 and v1+v2-in-v3, plus a regression test
for the still-rejected v3-in-v2 downgrade direction.
@Jeffail Jeffail force-pushed the fix-manifest-list-mixed-versions branch from d88b718 to 2e7b874 Compare May 6, 2026 17:05
@zeroshade zeroshade merged commit dfc5851 into apache:main May 6, 2026
14 checks passed
Jeffail added a commit to redpanda-data/connect that referenced this pull request May 7, 2026
Pulls in apache/iceberg-go#1030, which fixes the upstream
ManifestListWriter rejecting v1 manifest files when writing a v2
manifest list. Without this, every commit on a table that was upgraded
from v1 to v2 (whether via Transaction.UpgradeFormatVersion in the
committer or out-of-band by another engine) failed with
"ManifestListWriter only supports version 2 manifest files" because the
historical v1 manifests surfaced through existingManifests() on every
commit and the v2 writer rejected them.

Adds a regression test that pins the upstream behaviour: a v1
ManifestFile passed to WriteManifestList(2, ...) must round-trip and
the decoded entry must inherit content=data and
sequence_number=min_sequence_number=0 per the Iceberg spec.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: ManifestListWriter rejects lower-version manifests, blocking v1->v2 table upgrades

2 participants