Abstract: Pandas is a fast and powerful open-source data analysis and manipulation framework written in Python. Apache Spark is an open-source unified analytics engine for large-scale data processing. Both are widely adopted in the Data Engineering and Data Science communities.
Even though there’s a great value in combining them in terms of productivity, scalability and performance, it’s often overlooked.
In this talk, we will show you how you can leverage recent developments in Apache Spark together with Pandas to enjoy the best of both worlds!
We will cover the following topics:
* Core concepts
* The motivation for combining Spark and Pandas
* Using Pandas code with Spark
* The new Pandas API in Spark
Bio: Itai Yaffe is a Senior Solutions Architect at Databricks. Prior to Databricks, Itai was a Principal Solutions Architect at Imply, and before that - a big data tech lead at Nielsen Identity, where he dealt with big data challenges using tools like Spark, Druid, Kafka, and others. He is also a part of the Israeli chapter's core team of Women in Big Data. Itai is keen on sharing his knowledge and has presented his real-life experience in various forums in the past.