-
Notifications
You must be signed in to change notification settings - Fork 2
ingest - implement and make use of connector classes #1519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5660ddf
to
0d10fd6
Compare
535a463
to
39ec8d0
Compare
2d3b04a
to
daa5c27
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1519 +/- ##
==========================================
+ Coverage 70.69% 71.19% +0.50%
==========================================
Files 125 129 +4
Lines 6504 6690 +186
Branches 742 732 -10
==========================================
+ Hits 4598 4763 +165
- Misses 1750 1775 +25
+ Partials 156 152 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
If we have different apis for these connectors, at some points we need to be able to declare what we expect to be able to do. Maybe this is a little stringent right now (before getting a subregistry of our default one, no api is available), so might need some reworking. I think the utility of this is more shown when being used in ingest.
Implement connector objects and tests for the various ingest sources
two points here - "ingest_datastore" connector that's decoupled from any specific storage. The idea is that user could set up storage for ingest with something like what's sort of outlined in #1555 - "storage" connectors underneath - I really needed a connector that both was non-versioned and also had some concept of listing things after a prefix. This maybe will be scrapped in the future. Alex and I have talked about how maybe all connectors should move towards "paths" (sort of RESTful) which would make this maybe redundant. But for now, I needed it.
reduce boilerplate. this could have gone either direction, but there were cases where "pull_versioned" needed a specific implementation so this felt more generalized.
d7e8026
to
5c69e2a
Compare
5c69e2a
to
7c1ad6c
Compare
dcpy/lifecycle/ingest/validate.py
Outdated
def validate_against_existing_versions( | ||
ds: recipes.Dataset, filepath: Path | ||
) -> ArchiveAction: | ||
def validate_against_existing_versions(ds: recipes.Dataset, filepath: Path) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename this function to something like is_new_version
and also update the doctsring where it says that the config file will be updated in line 16? From my understanding, the config file will no longer record the date checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow the docstring is way inaccurate. Yeah will update.
Maybe compare_to_existing_version
? is_equivalent_to_existing_version
? I think the former maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the scope is more "check if it's new (against the existing versions in recipes), then if it isn't validate the data"... which sort of leads me back to the current name. But to that point, I think the scope of this function is a little confusing.
It also has a dependency on recipes
which needs to be removed. Let me add a commit to try to clean this up and we can see if we think it belongs or if it's a little too weird to try to fit into this PR
9656b3e
to
bf482ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed at high level -- LGTM!
91bd355
to
25d385f
Compare
25d385f
to
25ada63
Compare
Closes #1515
Multiple things big and small going on here. I tried to be quite detailed in commit descriptions, so look there as you go by commit. Definitely go by commit, I think it's relatively sane
all the connector-related ingest changes are in one commit (9th) - if you're unsure of how the functionality in a specific commit is relevant, I recommend checking that one as a reference.
Main takeaways
My biggest open questions
edm.recipes
vsingest_datastore
-> should this new connector just go in recipes? Should it become one with the current recipes connector?library
. Also don't have to worry about refactoring name (ingest_datastore is generic, better to expose slightly to users, whereasedm.recipes
is very much OUR datastore)