Skip to content

feat: Update with_commit_info() API to use create_one() API #997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

joonyoo181
Copy link
Collaborator

@joonyoo181 joonyoo181 commented Jun 2, 2025

Fixes #762

What changes are proposed in this pull request?

To append Commit Provenance Information to a transaction, the current implementation provides an API called with_commit_info() that takes in the data forengine_commit_info in the form of EngineData. Instead, this PR proposes to change the name into with_engine_commit_info() and take in a HashMap<String, String> as data. This data will be materialized into EngineData during commit() using the new create_one() API.

Additionally, this PR removes old test cases that validate the schema of engine_commit_info.

This PR affects the following public APIs

renaming with_commit_info(EngineData) --> with_engine_commit_info(HashMap<String, String>) in transactions.rs

How was this change tested?

I ran cargo nextest run and passed all tests

Copy link

codecov bot commented Jun 2, 2025

Codecov Report

Attention: Patch coverage is 85.71429% with 6 lines in your changes missing coverage. Please review.

Project coverage is 84.56%. Comparing base (717fd48) to head (d2943f0).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/transaction.rs 85.71% 2 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #997      +/-   ##
==========================================
- Coverage   84.68%   84.56%   -0.12%     
==========================================
  Files          92       92              
  Lines       23025    22769     -256     
  Branches    23025    22769     -256     
==========================================
- Hits        19499    19255     -244     
+ Misses       2564     2557       -7     
+ Partials      962      957       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added the breaking-change Change that require a major version bump label Jun 2, 2025
@joonyoo181 joonyoo181 marked this pull request as ready for review June 2, 2025 16:54
@joonyoo181 joonyoo181 requested review from zachschuermann, scovich and nicklan and removed request for scovich June 2, 2025 16:54
@joonyoo181
Copy link
Collaborator Author

Error::InvalidCommitInfo is no longer being used in the code. Previously, this was thrown if the user passed in an invalid engine_commit_into (e.g. wrong schema) and we had to validate since it was of type EngineData. With the new PR, we don't need to validate since users are constrained by the new APIs to modify CommitInfo. Should this be deleted?

Copy link
Collaborator

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome start! in an effort to keep changes minimal/targeted, how about we just start with:

  1. (already implemented) with_operation(operation: String)
  2. (new!) with_engine_commit_info(engine_commit_info: HashMap<String, String>)

and create a separate issue/PR for with_operation_parameters(operation_parameters: HashMap<String, String>?

I don't think we will want to make kernel version a public/builder-level API and it isn't clear to me that there's utility in the timestamp case either so let's hold off on those.

@joonyoo181 joonyoo181 requested a review from zachschuermann June 2, 2025 19:12
Copy link
Collaborator

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments!

Comment on lines +181 to +183
// The engine is required to provide commit info before committing the transaction. If the
// engine would like to omit engine-specific commit info, it can do so by passing pass an
// empty map.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be true anymore right? can we make the API just consume something like an iterable (string, string) and we just (internally) accumulate in a hashmap?

// The engine is required to provide commit info before committing the transaction. If the
// engine would like to omit engine-specific commit info, it can do so by passing pass an
// empty map.
pub fn with_commit_info(mut self, engine_commit_info: HashMap<String, String>) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's flag this public API naming and discuss! perhaps we want with_engine_commit_info (since this is the engineCommitInfo field inside the commitInfo action)? cc @nicklan @scovich

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think with_engine_commit_info would be a better name too, but then I wasn't sure how to determine whether or not CommitInfo has been appended to the transaction. Perhaps we could introduce another call called add_commit_info() that sets a boolean flag?

Comment on lines 313 to 331
let hashmap_to_scalar = |hm: Option<&HashMap<String, String>>| -> DeltaResult<Scalar> {
match hm {
Some(map) => {
let pairs = map.iter().map(|(k, v)| (k.clone(), v.clone()));
let map_data = MapData::try_new(
MapType::new(DataType::STRING, DataType::STRING, false),
pairs,
)?;
Ok(Scalar::Map(map_data))
}
None => {
let map_data = MapData::try_new(
MapType::new(DataType::STRING, DataType::STRING, false),
std::iter::empty::<(String, String)>(),
)?;
Ok(Scalar::Map(map_data))
}
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could/should implement From<HashMap> or rather something like From<Iterator<(K, V)>> for Scalar?

Copy link
Collaborator Author

@joonyoo181 joonyoo181 Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joonyoo181 joonyoo181 changed the title feat: Replace with_commit_info() API with multiple APIs to provide CommitInfo data and materialize it during commit() feat: Update with_commit_info() API to use create_one() API Jun 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Change that require a major version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove the need for transaction.with_commit_info() (after null_row work lands)
2 participants