Speed up Machine Learning Workflow’s in the Cloud with Lightning


It’s estimated to cost around $4.6 million US dollars and 355 years to train GPT-3 on a single GPU in 2020. Training large models in the cloud successfully requires optimization optimizations that we get out of the box with the open source library, lightning, by Lightning AI. In this workshop will walk through how to use lightning to speed up machine learning workflows in the Cloud. We will begin with an introduction to the different methods of speeding up training along with their cost implications and technical complexities. Then we’ll learn how to leverage key features of the lightning library like LightningDataset and multi-GPU training to speed up our training workflows in the cloud. By the end of this workshop, you should be able to train a basic model with fast data loading on multiple GPU’s.

Session Outline:

Lesson 1: Overview & Environment setup
Learn why training fast is important and the impact it has on costs. We’ll review the current challenges with efficient training and how Lightning was built to to solve those challenges. Bring your computer so you can setup a basic model that we’ll learn how to train efficiently together.

Lesson 2: Large datasets
Understand the cost considerations when working with large datasets on the cloud. We’ll also review the most common libraries for training with large datasets and learn how to create a custom LightningDataset that efficiently works with the Imagenet dataset on S3.

Lesson 3: Mutli-GPU
Here we’ll go over the cost and operational complexities of mutli-GPU training and learn how to use Lightning’s out-of-the-box multi-GPU support.

Lesson 4 (optional): Further challenges
What happens when training doesn’t fit on 1 GPU? If time allows, we’ll talk about some of the ongoing challenges with large-scale training and how Lightning is constantly evolving to solve the hardest and most common challenges.


Background Knowledge:

python, elementary knowledge of ML


Noha Alon joined as a founding team member at Lightning AI and currently holds an engineering leadership position. She leads parts of the effort to build the Lightning AI platform which aims to revolutionize the AI development workflow. Previously she worked on ML projects at Glossier and the LLM team at Microsoft. She holds a bachelor's degree in Software Engineering from Cal Poly, San Luis Obispo.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google