Apache Spark & Your Favorite Python Tools: Working Together for Fast Data Science at Scale
Apache Spark & Your Favorite Python Tools: Working Together for Fast Data Science at Scale


We'll start with the basics of machine learning on Apache Spark: when to use it, how it works, and how it compares to all of your other favorite data science tooling.

You'll learn to use Spark (with Python) for statistics, modeling, scoring (inference), and model tuning. But you'll also get a peek behind the APIs: see why the pieces are arranged as they are, how to get the most out of the docs, open source ecosystem, third-party libraries, and solutions to common challenges.

By lunch, you will understand when, why, and how Spark fits into the data science world, and you'll be comfortable doing your own feature engineering and modeling with Spark.

We will then look at some of the newest features in Spark that allow elegant, high performance integration with your favorite Python tooling. We'll discuss distributed scheduling for popular libraries like TensorFlow, as well as fast model inference, traditionally a challenge with Spark. We'll even see how you can integrate Spark with Python+GPU computation using RapidsAI, or elegantly switch gears to Dask for a more Pythonic approach to many big-data tasks.

By the end of the day, you will be caught up on the latest, easiest, fastest, and most user friendly ways of applying Apache Spark in your job and/or research.


Adam Breindel consults and teaches widely on Apache Spark, big data engineering, and machine learning. He supports instructional initiatives and teaches as a senior instructor at Databricks, teaches classes on Apache Spark and on deep learning for O’Reilly, and runs a business helping large firms and startups implement data and ML architectures. Adam’s 20 years of engineering experience include streaming analytics, machine learning systems, and cluster management schedulers for some of the world’s largest banks, along with web, mobile, and embedded device apps for startups. His first full-time job in tech was on a neural-net-based fraud detection system for debit transactions, back in the bad old days when some neural nets were patented (!) and he’s much happier living in the age of amazing open-source data and ML tools today.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google