A Bamboo of Pandas: Crossing Pandas’ Single-machine Barrier with Apache Spark


Pandas is a fast and powerful open-source data analysis and manipulation framework written in Python. Apache Spark is an open-source unified analytics engine for large-scale data processing. Both are widely adopted in the Data Engineering and Data Science communities.

Even though there’s a great value in combining them in terms of productivity, scalability and performance, it’s often overlooked.

In this talk, we will show you how you can leverage recent developments in Apache Spark together with Pandas to enjoy the best of both worlds!

We will cover the following topics:
* Core concepts
* The motivation for combining Spark and Pandas
* Using Pandas code with Spark
* The new Pandas API in Spark


Itai Yaffe is a Senior Solutions Architect at Databricks. Prior to Databricks, Itai was a Principal Solutions Architect at Imply, and before that - a big data tech lead at Nielsen Identity, where he dealt with big data challenges using tools like Spark, Druid, Kafka, and others. He is also a part of the Israeli chapter's core team of Women in Big Data. Itai is keen on sharing his knowledge and has presented his real-life experience in various forums in the past.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google