Skip to content

[668] Support bucket partition transform for Iceberg Sources and Delta Targets #670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

the-other-tim-brown
Copy link
Contributor

What is the purpose of the pull request

Addresses #668
Adds support for handling Iceberg's Bucket partitioning when converting to Delta Lake.

Brief change log

  • Skips handling of Void partition which will always return a null value
  • Handle bucket partitioning by adding a new field in InternalPartitionField to track partitioning options like number of buckets
  • Delta Lake support is added by using generated column to represent the partition similar to what we do for date based partitioning

Verify this pull request

  • Added unit tests for conversion cases
  • Added integration test for Iceberg to Delta with bucket partitioning and void partition to test case reported

@@ -47,4 +49,6 @@ public class InternalPartitionField {
@Builder.Default List<String> partitionFieldNames = Collections.emptyList();
// An enum describing how the source data was transformed into the partition value
PartitionTransformType transformType;
// Transform options such as number of buckets in the BUCKET transform type
@Builder.Default Map<String, Object> transformOptions = Collections.emptyMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Object, should we add a new class known as InternalValue ?

InternalValue { 
  InternalType type; 
  Object value;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, I made this similar to the metadata in InternalSchema. Another option I considered is to change PartitionTransformType into a class which can then have its own instance variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is okay for now.

@the-other-tim-brown the-other-tim-brown force-pushed the 668-bucket-partition-transform branch from f6077b0 to ea1b532 Compare April 1, 2025 01:11
@the-other-tim-brown the-other-tim-brown merged commit dafe24a into apache:main Apr 1, 2025
2 checks passed
@the-other-tim-brown the-other-tim-brown deleted the 668-bucket-partition-transform branch April 1, 2025 01:32
@vinishjail97 vinishjail97 mentioned this pull request Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants