-
Notifications
You must be signed in to change notification settings - Fork 7
Possible options to store and serve ML models #1
Description
DVC and CML (Continuous Machine Learning)
Build models using GitHub Actions or GitLab CI: https://cml.dev/
Version models with https://dvc.org/
See https://determined.ai/blog/building-an-enterprise-deep-learning-platform-2/
DVC is significantly more lightweight than Pachyderm, running locally and adding versioning on top of your local storage solution. DVC simply integrates into existing Git repositories to track the version of data that was used to run experiments. ML teams can also define and execute transformation pipelines with DVC; however, the biggest drawback of DVC is that those transformations run locally and are not automatically scaled to a cluster. Notably, DVC does not handle the storage of data, simply the versioning.
Explore Kubeflow?
OpenML to share models
Concepts: https://openml.github.io/OpenML/#concepts
See how to publish a dataset:
https://openml.github.io/openml-python/master/examples/30_extended/datasets_tutorial.html#sphx-glr-examples-30-extended-datasets-tutorial-py
Should we also publish tasks?
Scann lib from google
https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html
- Code on GitHub: https://github.com/google-research/google-research/tree/master/scann
- Notebook: https://github.com/google-research/google-research/blob/master/scann/docs/example.ipynb
- More docs: https://github.com/google-research/google-research/blob/master/scann/docs/algorithms.md
Clipper AI API
Clipper AI: Serve ML models (tensorflow, pytorch, sklearn...) through a HTTP REST API (no OpenAPI support builtin)
Pachyderm
Build, train, and deploy your data science workloads on whatever Kubernetes deployment you call home.
https://www.pachyderm.com/getting-started/
Machine Learning model databases
ModelDB
Open Source ML Model Versioning, Metadata, and Experiment Management
https://github.com/VertaAI/modeldb
Video presentation: https://databricks.com/fr/session/modeldb-a-system-to-manage-machine-learning-models
MLDB
MLDB (Machine Learning Database) is an open-source database designed for machine learning. You can install it wherever you want and send it commands over a RESTful API to -store data, explore it using SQL, then train machine learning models and expose them as APIs