Skip to content

Possible options to store and serve ML models #1

@vemonet

Description

@vemonet

DVC and CML (Continuous Machine Learning)

Build models using GitHub Actions or GitLab CI: https://cml.dev/

Version models with https://dvc.org/
See https://determined.ai/blog/building-an-enterprise-deep-learning-platform-2/

DVC is significantly more lightweight than Pachyderm, running locally and adding versioning on top of your local storage solution. DVC simply integrates into existing Git repositories to track the version of data that was used to run experiments. ML teams can also define and execute transformation pipelines with DVC; however, the biggest drawback of DVC is that those transformations run locally and are not automatically scaled to a cluster. Notably, DVC does not handle the storage of data, simply the versioning.

See also: https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9264-how-to-build-efficient-ml-pipelines-from-the-startup-perspective.pdf

Explore Kubeflow?

OpenML to share models

Concepts: https://openml.github.io/OpenML/#concepts

See how to publish a dataset:
https://openml.github.io/openml-python/master/examples/30_extended/datasets_tutorial.html#sphx-glr-examples-30-extended-datasets-tutorial-py

Should we also publish tasks?

Scann lib from google

https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html

Clipper AI API

Clipper AI: Serve ML models (tensorflow, pytorch, sklearn...) through a HTTP REST API (no OpenAPI support builtin)

Pachyderm

Build, train, and deploy your data science workloads on whatever Kubernetes deployment you call home.

https://www.pachyderm.com/getting-started/

Machine Learning model databases

ModelDB

Open Source ML Model Versioning, Metadata, and Experiment Management

https://github.com/VertaAI/modeldb

Video presentation: https://databricks.com/fr/session/modeldb-a-system-to-manage-machine-learning-models

MLDB

https://mldb.ai/

MLDB (Machine Learning Database) is an open-source database designed for machine learning. You can install it wherever you want and send it commands over a RESTful API to -store data, explore it using SQL, then train machine learning models and expose them as APIs

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions