
Abstract: In the fast-paced world of data science and AI, we will explore how large language models (LLMs) can elevate the development process of Apache Spark™ applications.
We'll demonstrate how LLMs can simplify SQL query creation, data ingestion, and DataFrame transformations, leading to faster development and more precise code that's easier to review and understand. We'll also show how LLMs can assist in creating visualizations and clarifying data insights, making complex data easy to understand.
Furthermore, we'll discuss how LLMs can be used to create user-defined data sources and functions, offering higher adaptability in Apache Spark applications.
Our session, filled with practical examples, highlights the innovative role of LLMs in the realm of Apache Spark development. We invite you to join us in exploring how these advanced language models can drive innovation and boost efficiency in data science and AI.
The attendees for this session will learn about simplifying open-source Apache Spark code generation using open-source and proprietary LLMs.
Bio: Gengliang Wang, a committed Apache Spark PMC Member and Committer, actively works on important Spark projects including ANSI SQL mode, TIMESTAMP_NTZ data type, and data sources. His contributions extend to enhancing the SQL compiler and UI. In addition, he's engaged in a project using large language models to streamline Spark application development. His work underscores a dedication to improving and making Apache Spark more user-friendly.

Gengliang Wang
Title
Senior Software Engineer | Databricks
