Skip to content

Feature/auto [infer-schema & auto-directory ingestion] #696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Mar 22, 2021
Merged

Conversation

mccrearyd
Copy link
Contributor

made it very easy to write dataset auto ingestion code. this branch supports image_classification directories that are structured as class_folder -> images.

it also removed the from_directory method on hub.Dataset & replaced it with from_path.

@mccrearyd mccrearyd requested a review from AbhinavTuli March 19, 2021 00:02
@github-actions
Copy link

Locust summary

Git references

Initial: 1a62a6c
Terminal: 9cef57c

hub/auto/util.py
Changes:
hub/api/dataset.py
Changes:
hub/auto/computer_vision/classification.py
Changes:
hub/auto/infer.py
Changes:
  • Name: _find_root
    Type: function
    Changed lines: 29
    Total lines: 29
    • Name: infer_dataset
      Type: function
      Changed lines: 35
      Total lines: 35
      hub/auto/tests/test_image_classification.py
      Changes:

      @mccrearyd
      Copy link
      Contributor Author

      mccrearyd commented Mar 19, 2021

      the tests i wrote may fail for windows... hopefully not though

      EDIT: aha! it didn't >:)

      @mccrearyd
      Copy link
      Contributor Author

      mccrearyd commented Mar 19, 2021

      don't merge yet, i added some asserts that should fail, if they don't then there might be a problem. basically the tests before should not have passed because kaggle credentials are required to run.

      @mynameisvinn
      Copy link
      Contributor

      I think many image datasets follow coco or voc style annotations, should we take that into account?

      @mccrearyd
      Copy link
      Contributor Author

      I think many image datasets follow coco or voc style annotations, should we take that into account?

      yes, but this is version 1. i don't care to support all datasets right off the bat. this is to get the feature into the hands of testers immediately so we can iterate. there will be many followup PRs.

      @mccrearyd
      Copy link
      Contributor Author

      mccrearyd commented Mar 20, 2021

      this is basically ready to merge, i rewrote the tests to not use kaggle datasets. but only 1 test failed and its outside my context (the master branch clci is failing). should we merge it anyways? @mynameisvinn @AbhinavTuli @imshashank

      source: https://app.circleci.com/pipelines/github/activeloopai/Hub/2089/workflows/6f462b61-3e37-4529-a88a-73c55fe6d94a/jobs/3082

      @codecov
      Copy link

      codecov bot commented Mar 21, 2021

      Codecov Report

      Merging #696 (fb59b54) into master (c08b6af) will increase coverage by 0.14%.
      The diff coverage is 91.11%.

      Impacted file tree graph

      @@            Coverage Diff             @@
      ##           master     #696      +/-   ##
      ==========================================
      + Coverage   89.09%   89.23%   +0.14%     
      ==========================================
        Files          58       63       +5     
        Lines        4299     4366      +67     
      ==========================================
      + Hits         3830     3896      +66     
      - Misses        469      470       +1     
      Impacted Files Coverage Δ
      hub/auto/infer.py 82.85% <82.85%> (ø)
      hub/auto/computer_vision/classification.py 92.15% <92.15%> (ø)
      hub/auto/util.py 95.34% <95.34%> (ø)
      hub/api/dataset.py 91.70% <100.00%> (+2.12%) ⬆️
      hub/auto/__init__.py 100.00% <100.00%> (ø)
      hub/auto/computer_vision/__init__.py 100.00% <100.00%> (ø)
      hub/store/metastore.py 85.04% <0.00%> (-5.61%) ⬇️
      ... and 2 more

      Continue to review full report at Codecov.

      Legend - Click here to learn more
      Δ = absolute <relative> (impact), ø = not affected, ? = missing data
      Powered by Codecov. Last update c08b6af...fb59b54. Read the comment docs.

      @mccrearyd
      Copy link
      Contributor Author

      nice i see you fixed the tests! @AbhinavTuli good stuff

      @mccrearyd mccrearyd merged commit 9dfd4c5 into master Mar 22, 2021
      @kristinagrig06 kristinagrig06 deleted the feature/auto branch May 31, 2021 13:00
      Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
      Labels
      None yet
      Projects
      None yet
      Development

      Successfully merging this pull request may close these issues.

      3 participants