-
Notifications
You must be signed in to change notification settings - Fork 84
refactor!: txn-specific write_metadata_schema #1021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor!: txn-specific write_metadata_schema #1021
Conversation
engine: &dyn Engine, | ||
write_metadata: impl Iterator<Item = &'a dyn EngineData> + Send + 'a, | ||
) -> impl Iterator<Item = DeltaResult<Box<dyn EngineData>>> + Send + 'a { | ||
let evaluation_handler = engine.evaluation_handler(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review with whitespace hidden
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1021 +/- ##
==========================================
+ Coverage 84.70% 84.71% +0.01%
==========================================
Files 92 92
Lines 23067 23266 +199
Branches 23067 23266 +199
==========================================
+ Hits 19538 19710 +172
- Misses 2568 2578 +10
- Partials 961 978 +17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not understanding how responsibility is divided between kernel and engine here? It seems like the engine provides the write schema? I guess that makes sense (e.g. engine can choose not to write columns that have default values), but we don't validate that it's a subset of the table schema?
fn generate_adds<'a>( | ||
&'a self, | ||
engine: &dyn Engine, | ||
write_metadata: impl Iterator<Item = &'a dyn EngineData> + Send + 'a, | ||
) -> impl Iterator<Item = DeltaResult<Box<dyn EngineData>>> + Send + 'a { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we take &'a self
, I think we can remove the named lifetimes?
At worst we might need + '_
for the iterators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, the compiler is yelling that anonymous lifetimes are unstable in impl trait
..
I was attempting to 'push down' the write metadata schema (the schema which we agree is how the engine communicates the metadata about files it wrote during appends) from being global to now per-table. The engine/kernel interaction is intended to be:
To be clear this is an unnecessary code change in isolation (the schema is constant for all transactions currently) but this paves the way for us to add the |
Who provides "our |
we (kernel) do(es)! this is the Transaction::write_metadata_schema -> when we do stats in the future this will be generated from the table schema |
What changes are proposed in this pull request?
On the way to supporting
stats
in writes, we must allow for thewrite_metadata_schema
to be specified per-txn/table instead of globally since the schema for write_metadata_schema will eventually include a stats column which is a function of the table schema.Concretely, this PR
transaction::get_write_metadata_schema()
write_metadata_schema()
toWriteContext
(and plumbs the schema through)Note that the actual naming will change in #1019
This PR affects the following public APIs
The (static)
transaction::get_write_metadata_schema()
is now a method:WriteContext::write_metadata_schema()
(and theWriteContext
is derived from a specificTransaction
.How was this change tested?
minor modifications to existing UT