Abstract: The success of a machine learning project depends on many components: data, algorithms, hardware backend. Developing a machine learning system in production is further complicated by the glaring gap between the development and production environments.
To reduce the overhead cost of communication and collaboration, many companies expect their data scientists and ML engineers to own their projects end-to-end, from data management to modeling to deployment, forcing them to learn tools out of their comfort zones.
In the first part of the workshop, we will cover the challenges of different phases of productionizing machine learning models, as well as the gap between the development and production environments. We will discuss various solutions to address the gap.
In the second half, we will walk over a hands-on tutorial on how to use Metaflow to push the development code from a local machine to production on AWS Batch with a line of code.
Lesson 1: Common myths of machine learning in production
Common misunderstandings of machine learning in production by companies in the early phase of ML adoption.
Lesson 2: The gap between development and production environments
We’ll discuss the difference between development and production phases in terms of the set of tools used for each phase and their requirements.
Lesson 3: Solutions to the gap
We’ll discuss the existing solutions to close the gap, both technical (leveraging containers and compilers technology), as well as organizational (team structure).
Lesson 4: Metaflow tutorials
We’ll show that how infrastructure abstraction tools like Metaflow can help data scientists own their ML products end-to-end without having to worry about lower-level infrastructure details such as setting up Docker containers and Kubernetes.
We will start with a simple machine learning model in a notebook on a local machine, then we will show how to package and run it on AWS Batch by adding one line of code.
The first part of the workshop doesn't have any requirements. For the second part, you should be comfortable with Python.
Bio: Chip Huyen is an engineer and founder working to develop tools that leverage real-time machine learning. Through her work with Snorkel AI, NVIDIA, and Netflix, she has helped some of the world’s largest organizations deploy machine learning systems. She teaches Machine Learning Systems Design at Stanford. She’s also published four bestselling Vietnamese books.