Abstract: Dask is the leading Python-native framework for distributed computing with a growing open source community. Dask is used by commercial enterprises and scientific communities for scheduling and coordinating task execution for data engineering and data science pipelines.
In this talk, you’ll learn:
How Dask as a tool is used to scale up existing Python libraries, like NumPy, pandas, and others
How Dask is used for large distributed computing in financial services and other advanced uses cases where parallelizing workloads produces speedups
How to take advantage of GPUs with computationally intensive workloads
The Dask roadmap
Attendees should have some basic knowledge of Dask, and/or are active users of Dask.
Bio: David started his career building predictive models at Allstate Insurance for pricing auto and homeowners insurance. After organizing a Kaggle competition to improve Allstate’s pricing models, he joined Kaggle as an early employee helping other companies make use of Kaggle competitions. At Kaggle, and later DataRobot, his career shifted toward software engineering, especially Python. David spent a few years with AWS Elastic File System improving their cloud infrastructure, co-founded Decision AI (later acquired by DataRobot), and is now a Senior Software Engineer at Coiled, working to help data scientists easily scale Python.
David is a speaker at technical and industry events, such as Open Data Science Conference and PyCon. David obtained his Masters of Science from the University of Chicago, while focusing on research that investigated machine learning methods for high-dimensional (esp. semi-
supervised) settings that exploit a lower-dimensional manifold structure by finding a good basis for functions that are smooth on the manifold. When not working as a Senior Software Engineer at Coiled, David loves racing small sailboats on the Charles River and Boston Harbor, and volunteers teaching new sailors at Boston’s Community Boating.