Skip to content

[Kernel]support RowTrackingDomainMetadata in Transaction.addDomainMetadata #4856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

KaiqiJinWow
Copy link
Collaborator

@KaiqiJinWow KaiqiJinWow commented Jun 30, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Support the pass the system domain metadata RowTrackingDomainMetadata in the Transaction.addDomianMetadata API, which would be used to set the highWaterMark of row tracking feature. This provided high water mark must be larger or same as the current largest row id, otherwise we would throw the exception.

This is required for supporting IcebergCompatV3 table as we need to keep the Iceberg nextRowId and Delta highWaterMark in sync.

How was this patch tested?

Unit tests

Does this PR introduce any user-facing changes?

@KaiqiJinWow KaiqiJinWow self-assigned this Jun 30, 2025
@@ -93,6 +94,7 @@ public class TransactionImpl implements Transaction {
private int maxRetries;
private int logCompactionInterval;
private Optional<CRCInfo> currentCrcInfo;
private Optional<Long> providedRowIdHighWatermark = Optional.empty();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to domainMetadataState?

@@ -252,7 +261,8 @@ private long commitWithRetry(
protocol,
Optional.empty() /* winningTxnRowIdHighWatermark */,
dataActions,
resolvedDomainMetadatas);
resolvedDomainMetadatas,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the check outside updateRowIdHighWatermarkIfNeeded so we don't even need to change the signature of pdateRowIdHighWatermarkIfNeeded

Comment on lines +51 to +52
Set<String> SUPPORTED_SYSTEM_DOMAINS = new HashSet<>();
SUPPORTED_SYSTEM_DOMAINS.add(RowTrackingMetadataDomain.DOMAIN_NAME);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a static variable

@@ -74,18 +74,21 @@ public class ConflictChecker {

// Helper states during conflict resolution
private Optional<Long> lastWinningRowIdHighWatermark = Optional.empty();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this into the private final section

}
}
});

if (currRowIdHighWatermark.get() != prevRowIdHighWatermark) {
// If the txn builder has explicitly provided a row ID high watermark, we should use that value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do a conflict resolution when the user provides a high watermark? I think the intention when they provide a high watermark is the connector wants to explicitly dictate the high watermark and the commit contents. Rebasing when conflicts could create unintended problems and can make the code too complex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants