
Abstract: Model Registry is becoming an essential part of machine learning technology stack. It helps to keep track of ML models for a team, connect ML models to production environments and manage model lifecycle. However, in many cases it requires an additional model registry SaaS service to store information about ML models. In many cases, this additional service leads to a divergence in the lifecycle of ML models and software applications.In this talk, we will show how ML engineers and data scientists can implement model registry using open source technologies such as Git, GitHub/GitLab and how they can manage ML model the same way as a software application without any additional services.
We will show how:
Git can be used as the source of truth for models, model versions and model statuses
Model lifecycle can be managed through GitHub Pull Requests or GitLab Merge Requests
CI/CD systems can deliver ML models to production
Advanced, practical use cases include:
Mono-repositories with ML models and model zoos
Tracking large models and storing weights files to cloud storages such as S3, GCS and Azure Blob Store
Model versioning
Connection to model deployment systems
The proposed model registry is based on software engineering best practices and ideas of GitOps. This makes model lifecycle compatible with software application lifecycle that simplifies operations around ML and software development teams.The talk requires basic knowledge of Git and GitHub/GitLab. After the talk, listeners will be able to implement fully functional model registries without any additional services with popular open source tools.
Bio: Dmitry Petrov is an ex-Data Scientist at Microsoft with Ph.D. in Computer Science and active open source contributor. He has written and open sourced the first version of DVC.org - machine learning workflow management tool. Also he implemented Wavelet-based image hashing algorithm (wHash) in open source library ImageHash for Python. Now Dmitry is working on tools for machine learning and ML workflow management as a co-founder and CEO of Iterative in San Francisco.