Abstract: Deep learning has exploded into the public consciousness with the advent of ChatGPT and other LLMs. The opportunities presented by these multi-billion-parameter model architectures are tremendous. But many businesses and researchers looking to adopt and fine-tune open-source models for their own applications are running into critical bottlenecks. High costs, long runtimes, extreme compute demands, and GPU scarcity — these are some of the key challenges that practitioners now face with large-model architectures.
Multi-model training, e.g., due to hyperparameter search or multi-user clusters, only compounds these issues by driving up computational demands further still. In this new setting of “multi-large-model” training, three critical interconnected systems issues have emerged: (1) parallelization of models over GPUs, (2) allocation of limited compute resources across jobs, and (3) scheduling.
Optimizing this three-part problem is critical to democratizing large-model architectures. Finding the best parallelization approach, allocating resources efficiently, and eliminating idling can dramatically reduce costs and runtimes, enabling more users to build and fine-tune their own models. Unfortunately, addressing these issues today requires tedious and often impractical manual effort.
Saturn is a new open-source system that tackles the above three issues via data management-inspired joint optimization techniques. We compose a new system architecture that automatically parallelizes models, allocates resources across multiple jobs, and constructs optimized execution schedules. Saturn automates and abstracts away the systems challenges to enable AI users to focus on what’s important to them — building new applications. Our experiments show that Saturn accelerates workloads by 30-50% (1.4-2X speedups) and can even halve compute costs.
In this talk, we’ll provide an overview of the core ideas behind Saturn, how it works on a technical level to reduce runtimes & costs, and the process of using Saturn for large-model finetuning. We’ll demonstrate how Saturn can accelerate and optimize large-model workloads in just a few lines of code and describe some high-value real-world use cases we’ve already seen in industry & academia.
Bio: Kabir Nagrecha is a Ph.D. candidate at UC San Diego, working with Professors Arun Kumar & Hao Zhang. His work focuses on systems infrastructure to support deep learning at scale, aiming to democratize large models and amplify the impact of machine learning applications. He is the recipient of the Meta Research Fellowship, as well as fellowships from the Halicioglu Data Science Institute and Jacobs School of Engineering at UCSD.
Kabir is the youngest-ever Ph.D. student at UCSD, having started his doctorate at the age of 17. He’s previously worked with companies such as Apple, Meta, & Netflix to build the core infrastructure that supports widely-used services such as Siri & Netflix’s recommendation algorithms. Most recently, he’s been working on Saturn, a new system to support automatic parallelization, scheduling, and resource apportioning for training large neural networks.