Cloud Native Data Science with Dask
Cloud Native Data Science with Dask

Abstract: 

Python has become a great language for data science. Libraries like NumPy, pandas, and Scikit-Learn provide high-performance, pleasant APIs for analyzing data. However, they’re focused on single-core, in-memory analytics, and so don't scale out to very large datasets or clusters of machines. That's where Dask comes in.

Dask is a library that natively scales Python. It works with libraries like NumPy, pandas, and Scikit-Learn to operate on datasets in parallel, potentially distributed on a cluster.

Moving to a cloud-native data science workflow will make you and your team more productive. You'll be able to more quickly iterate on the data collection, visualization, modeling, testing, and deployment cycle.

Attendees will learn the high-level user-interfaces dask provides like dask.array and dask.dataframe. These let you write regular Python, NumPy, or Pandas code that is then executed in parallel on datasets that may be larger than memory. We'll learn through hands-on exercises. Each attendee will be provided with their own Dask cluster to develop and run their solutions.

Dask is a flexible parallelization framework; we'll demonstrate that flexibility with some machine-learning workloads. We'll use Dask to easily distribute a large scikit-learn grid search to run a cluster of machines. We'll use Dask-ML to work with larger-than-memory datasets.

We'll see how Dask can be deployed on Kubernetes, taking advantage of features like auto-scaling, where new worker pods are automatically created or destroyed based on the current workload

Bio: 

Tom is a Data Scientist and developer at Anaconda and works on open source projects including dask and pandas. Tom’s current focus is on scaling out Python's machine learning ecosystem to larger datasets and larger models

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google