Abstract: Data scientists shouldn’t necessarily need to know too much about Kubernetes (K8s), yet many are interested in it. Knowledge of K8s can be a big productivity booster as well as a big differentiator in the job market. So, we're here to empower your K8s journey! This workshop will be a gentle introduction for data scientists who want to take their production DS/ML chops to the next level.
In this workshop, you’ll learn:
- What K8s is and what it isn’t;
- Why you should care;
- The differences between K8s, Docker, and other moving parts (such as Terraform);
- How to understand the lifecycle of a K8s project;
- What the core concepts of deploying K8s workloads are;
- What Kubernetes APIs are most relevant to data science workloads
Module 1: Develop locally in a notebook
- Iterate on a machine learning task/model in a familiar Jupyter environment, we will provide a hosted environment participants can access in a web browser.
- This part is to set the stage in a familiar environment where data analysts and scientists will learn how the notebooks are used to fit into a broader story of production software.
Module 2: Refactor the code into a workflow
- Participants will use the open-source Metaflow framework to migrate the code from a notebook into a workflow defined in a Python file.
- Participants will learn basic notions of how to structure projects as they want to harden results into production-ready workflows.
Module 3: Scale the workflow with Kubernetes
- This section will introduce Kubernetes concepts, and learners will discover how to use Metaflow to access these Kubernetes features in the workflows structured in module 2.
Module 4: Deploy the workflow to run automatically in production
- Finally, learners will use Metaflow APIs to deploy their workflows to run on Argo workflows, a production orchestrator, without human intervention. Learners will understand the ways that production ML workflows are run in practice and get an introduction to building software systems around ML workflows.
- Working with Python scripts - importing packages, conditionals, loops, fundamentals of object-oriented programming
- Experience using Jupyter notebooks
- Running scripts in the command line
Bio: Eddie Mattia is a data scientist working on Metaflow and foundation models at Outerbounds. He began using Python to teach applied math in grad school. Since then, Eddie has worked at startups and at Intel building machine learning software.