
Abstract: In this talk, attendees will get an introduction to Dask, a distributed computing framework in the PyData ecosystem.
The first half of the talk will describe the current state of the project and its ecosystem including distributed data collections, cloud deployment options, distributed machine learning projects, and workflow orchestration.
The second half of the talk will be a live demo showing the programming model for machine learning on Dask. Dask's potential for speeding up machine learning workflows will be demonstrated with an intermediate-level tutorial on training XGBoost and LightGBM models with Dask.
Bio: James Lamb is a software engineer at Saturn Cloud, where he works on a managed data science platform built on Dask and Kubernetes. Before Saturn Cloud, James worked on industrial internet of things (IIoT) problems as a data scientist at AWS and Chicago-based Uptake. He is a core maintainer on LightGBM, and has contributed on other open source data science and data engineering projects such as XGBoost and prefect. James holds Masters degrees in Applied Economics (Marquette University) and Data Science (University of California, Berkeley).