
Abstract: It’s estimated to cost around $4.6 million US dollars and 355 years to train GPT-3 on a single GPU in 2020. Training large models in the cloud successfully requires optimization optimizations that we get out of the box with the open source library, lightning, by Lightning AI. In this workshop will walk through how to use lightning to speed up machine learning workflows in the Cloud. We will begin with an introduction to the different methods of speeding up training along with their cost implications and technical complexities. Then we’ll learn how to leverage key features of the lightning library like LightningDataset and multi-GPU training to speed up our training workflows in the cloud. By the end of this workshop, you should be able to train a basic model with fast data loading on multiple GPU’s.
Session Outline:
Lesson 1: Overview & Environment setup
Learn why training fast is important and the impact it has on costs. We’ll review the current challenges with efficient training and how Lightning was built to to solve those challenges. Bring your computer so you can setup a basic model that we’ll learn how to train efficiently together.
Lesson 2: Large datasets
Understand the cost considerations when working with large datasets on the cloud. We’ll also review the most common libraries for training with large datasets and learn how to create a custom LightningDataset that efficiently works with the Imagenet dataset on S3.
Lesson 3: Mutli-GPU
Here we’ll go over the cost and operational complexities of mutli-GPU training and learn how to use Lightning’s out-of-the-box multi-GPU support.
Lesson 4 (optional): Further challenges
What happens when training doesn’t fit on 1 GPU? If time allows, we’ll talk about some of the ongoing challenges with large-scale training and how Lightning is constantly evolving to solve the hardest and most common challenges.
https://github.com/Lightning-AI/lightning
Background Knowledge:
python, elementary knowledge of ML
Bio: Daniela Dapena is a research scientist at Lightning AI, where she works on different deep-learning models and front-end development. She obtained her Ph.D. from the University of Delaware in 2022, where her research focused on graph signal processing and its applications to machine learning. She holds a bachelor’s degree in electrical engineering from the University de Los Andes in Merida, Venezuela, obtained in 2018.

Daniela Dapena, PhD
Title
Community Research Scientist | Lightning AI
