Kubernetes for Data Scientists


Data scientists shouldn’t necessarily need to know too much about Kubernetes (K8s), yet many are interested in it. Knowledge of K8s can be a big productivity booster as well as a big differentiator in the job market. So, we're here to empower your K8s journey! This workshop will be a gentle introduction for data scientists who want to take their production DS/ML chops to the next level.

In this workshop, you’ll learn:

- What K8s is and what it isn’t;
- Why you should care;
- The differences between K8s, Docker, and other moving parts (such as Terraform);
- How to understand the lifecycle of a K8s project;
- What the core concepts of deploying K8s workloads are;
- What Kubernetes APIs are most relevant to data science workloads

Session Outline:

Module 1: Develop locally in a notebook
- Iterate on a machine learning task/model in a familiar Jupyter environment, we will provide a hosted environment participants can access in a web browser.
- This part is to set the stage in a familiar environment where data analysts and scientists will learn how the notebooks are used to fit into a broader story of production software.

Module 2: Refactor the code into a workflow
- Participants will use the open-source Metaflow framework to migrate the code from a notebook into a workflow defined in a Python file.
- Participants will learn basic notions of how to structure projects as they want to harden results into production-ready workflows.

Module 3: Scale the workflow with Kubernetes
- This section will introduce Kubernetes concepts, and learners will discover how to use Metaflow to access these Kubernetes features in the workflows structured in module 2.

Module 4: Deploy the workflow to run automatically in production
- Finally, learners will use Metaflow APIs to deploy their workflows to run on Argo workflows, a production orchestrator, without human intervention. Learners will understand the ways that production ML workflows are run in practice and get an introduction to building software systems around ML workflows.

Background Knowledge:

- Working with Python scripts - importing packages, conditionals, loops, fundamentals of object-oriented programming
- Experience using Jupyter notebooks
- Running scripts in the command line


Shri Javadekar is currently an engineer at Outerbounds, focussed on building a fully managed, large-scale platform for running data intensive ML/AI workloads. Earlier, he was a co-founder of an MLOps company called Novus Labs. Prior to that he led the design, development and operations of Kubernetes based infrastructure at Intuit, running thousands of applications, built by hundreds of teams and transacting billions of $$. He is a founding engineer of the Argo open-source project and also spent precious time at multiple startups that were acquired by large organizations like EMC/Dell and VMWare.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google