
Abstract: In the fast-paced world of data science and AI, we will explore how large language models (LLMs) can elevate the development process of Apache Spark™ applications.
We'll demonstrate how LLMs can simplify SQL query creation, data ingestion, and DataFrame transformations, leading to faster development and more precise code that's easier to review and understand. We'll also show how LLMs can assist in creating visualizations and clarifying data insights, making complex data easy to understand.
Furthermore, we'll discuss how LLMs can be used to create user-defined data sources and functions, offering higher adaptability in Apache Spark applications.
Our session, filled with practical examples, highlights the innovative role of LLMs in the realm of Apache Spark development. We invite you to join us in exploring how these advanced language models can drive innovation and boost efficiency in data science and AI.
The attendees for this session will learn about simplifying open-source Apache Spark code generation using open-source and proprietary LLMs.
Bio: Allison is a Senior Software Engineer at Databricks where she focuses on Spark SQL and PySpark. Before Databricks, she was an early member of Robinhood’s data team. She holds a Bachelor’s degree in Computer Science from Carnegie Mellon University.

Allison Wang
Title
Senior Software Engineer | Databricks
