Abstract: Dask is a parallel computing library for Python people. This talk will be a gentle introduction to Dask, showing how you can improve the speed of data science code on your laptop with a simple "pip install". Then we will use the same code to process big data on a cluster of machines. We will be going through an end-to-end data science pipeline, from ETL and exploratory analysis to machine learning model training and scoring.
Bio: Aaron Richter is a software developer turned data engineer and data scientist. He has pioneered the development and implementation of large-scale data science infrastructure in both business and research environments. Inevitably, he spent a lot of time finding efficient ways to clean data, run pipelines, and tune models. Aaron holds a PhD in machine learning from Florida Atlantic University.
Aaron Richter, PhD
Senior Data Scientist at Senior Data Scientist | Saturn Cloud