
Abstract: The success of a machine learning project depends on many components: data, algorithms, hardware backend. Developing a machine learning system in production is further complicated by the glaring gap between the development and production environments.
To reduce the overhead cost of communication and collaboration, many companies expect their data scientists and ML engineers to own their projects end-to-end, from data management to modeling to deployment, forcing them to learn tools out of their comfort zones.
In the first part of the workshop, we will cover the challenges of different phases of productionizing machine learning models, as well as the gap between the development and production environments. We will discuss various solutions to address the gap.
In the second half, we will walk over a hands-on tutorial on how to use Metaflow to push the development code from a local machine to production on AWS Batch with a line of code.
Bio: Ville has been developing infrastructure for machine learning for over two decades. He has worked as an ML researcher in academia and as a leader at a number of companies, including Netflix where he led the ML infrastructure team that created Metaflow, a popular open-source framework for data science infrastructure. He is the co-founder and CEO of Outerbounds, a company developing modern human-centric ML. He is also the author of an upcoming book, Effective Data Science Infrastructure, published by Manning.