-
Notifications
You must be signed in to change notification settings - Fork 84
feat: CRC for PM replay #982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #982 +/- ##
==========================================
- Coverage 85.07% 85.07% -0.01%
==========================================
Files 90 90
Lines 23387 23410 +23
Branches 23387 23410 +23
==========================================
+ Hits 19896 19915 +19
Misses 2479 2479
- Partials 1012 1016 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
pub(crate) protocol: Option<Protocol>, | ||
pub(crate) metadata: Option<Metadata>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fixing an incorrect usage of Default
--> we want Default to just set up 'empty' P/M for us to fill in after visiting - not to generate some default P/M itself.
}); | ||
|
||
let json = crc_json.to_string().into_bytes(); | ||
// let (engine, store) = DefaultEngine::new_memory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyone know why I can't seem to get the extension trait working here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know now and can remove the commented out line (sadly)
@@ -94,14 +93,46 @@ pub(crate) struct DeletedRecordCountsHistogram { | |||
pub(crate) deleted_record_counts: Vec<i64>, | |||
} | |||
|
|||
#[derive(Debug, Clone, PartialEq, Eq)] | |||
pub(crate) struct ProtocolMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we're just using this here to return a protocol and metadata? We have lots of other places where we pair them together as either arguments, or a (protocol, metadata)
. I'd say here can we just do:
fn try_protocol_and_metadata_from_crc(path: FileMeta, engine: &dyn Engine) -> DeltaResult<(Protocol, Metadata)> {}
We could pass through everything and unify on a single struct in the future, but for now this is just confusing
/// protocol/metadata cannot be found in the CRC file (specification requires both to be | ||
/// present). | ||
pub(crate) fn try_from_crc(path: FileMeta, engine: &dyn Engine) -> DeltaResult<Self> { | ||
let json = engine.json_handler(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
micro_nit:
let json = engine.json_handler(); | |
let json_handler = engine.json_handler(); |
}); | ||
|
||
let json = crc_json.to_string().into_bytes(); | ||
// let (engine, store) = DefaultEngine::new_memory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know now and can remove the commented out line (sadly)
let protocol_metadata_log_segment: Cow<'_, Self> = | ||
if let Some(latest_crc_file) = &self.latest_crc_file { | ||
let mut log_segment = self.clone(); | ||
log_segment.checkpoint_version = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we pruned here so we know that any crc has to be after the checkpoint?
// 2. construct a "protocol-metadata log segment" | ||
// a) if we have a CRC file, we only need commits (CRC_version, target_version] | ||
// b) if we don't have a CRC, we proceed as normal | ||
let protocol_metadata_log_segment: Cow<'_, Self> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to add a start_version: Option<Version>
to replay_for_metadata
. In that case, find_commit_cover
could quite easily just drop any versions less than start_version
, and we wouldn't need this code really at all (and could avoid the clone)
if let Some(latest_crc_file) = &self.latest_crc_file { | ||
let path = &latest_crc_file.location; | ||
let pm = ProtocolMetadata::try_from_crc(path.clone(), engine)?; | ||
metadata_opt.get_or_insert(pm.metadata); | ||
protocol_opt.get_or_insert(pm.protocol); | ||
} | ||
|
||
Ok((metadata_opt, protocol_opt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let Some(latest_crc_file) = &self.latest_crc_file { | |
let path = &latest_crc_file.location; | |
let pm = ProtocolMetadata::try_from_crc(path.clone(), engine)?; | |
metadata_opt.get_or_insert(pm.metadata); | |
protocol_opt.get_or_insert(pm.protocol); | |
} | |
Ok((metadata_opt, protocol_opt)) | |
if let Some(latest_crc_file) = &self.latest_crc_file { | |
let path = &latest_crc_file.location; | |
let pm = ProtocolMetadata::try_from_crc(path.clone(), engine)?; | |
Ok((metadata_opt.or(Some(pm.metadata)), protocol_opt.or(Some(pm.protocol)))) | |
} else { | |
Ok((metadata_opt, protocol_opt)) | |
} |
@@ -164,4 +166,15 @@ impl DefaultEngineExtension for DefaultEngine<TokioBackgroundExecutor> { | |||
TokioBackgroundExecutor::new().into(), | |||
)) | |||
} | |||
|
|||
fn new_memory() -> (Arc<DefaultEngine<TokioBackgroundExecutor>>, Arc<InMemory>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably can remove this now right, or move it to the kernel
crate
What changes are proposed in this pull request?
Leverage CRC files for P+M replay. We do this by modifying our PM log replay to:
How was this change tested?
new UTs