Abstract: This session takes you through the Lightning ecosystem and introduces you to the PyTorch Lightning framework. We'll begin by looking at Lightning Apps, which are fully customizable machine learning workflows, as well as other frameworks within the Lightning ecosystem. We’ll then look at the fundamental components of the PyTorch Lightning framework. This session also explores the fundamentals of distributed training and demonstrates how scaling works with ML model training. Using the PyTorch Lightning components, we demonstrate how to implement a simple model and scale it with different distributed strategies and accelerators with ease, all without worrying about the hassles of engineering.
Part 1: Overview of Lightning Projects
We will briefly go over several Lightning projects like the Lightning Apps framework, Lightning Flash, and TorchMetrics, and discuss how you can leverage them in your machine learning projects.
Part 2: An Introduction to the PyTorch Lightning Core Components
This section will go through PyTorch Lightning's core building components and discuss how they fit into a typical research/data scientist development pipeline. We'll show you how to organize your PyTorch research code in LightningModule and go through the feature-rich PyTorch Lightning Trainer to help you supercharge your ML pipeline.
Part 3: Fundamentals of Distributed Training
We will discuss the core principles of distributed training in machine learning. We'll also talk about why we need it and why it's so complicated. Then, we'll go over two fundamental approaches to distributed training in depth: Data and Model Parallelism.
Part 4: Using PyTorch Lightning at Scale
Accelerator refers to the hardware being used by PyTorch Lightning for training and inference applications. Currently, PyTorch Lightning supports several accelerators: CPUs, GPUs, TPUs, IPUs, and HPUs. We will go over some of the accelerators in depth.
As an ML practitioner, you'd like to focus more on research rather than engineering logic around hardware.
We'll show you how to easily scale your training for a large dataset across several accelerators such as GPUs, TPUs and HPUs. We'll also go through the essential API internals of how PyTorch Lightning succeeds in abstracting the accelerator logic from users with support for distributed strategies, allowing them to focus on writing accelerator-agnostic code.
Who is it aimed at?
Data scientists and ML engineers, who may or may not have used PyTorch Lightning in the past and wish to use distributed training for their models.
What will the audience learn by attending the session?
Learn about the Lightning Ecosystem Get started with PyTorch Lightning
Get an overview of Distributed Training and several ML accelerators
Train a model with PyTorch Lightning using different accelerators and strategies.
Some familiarity with Python, deep learning terminology, and the basics of neural networks.
Bio: Kaushik Bokka is a Senior Research Engineer at Lightning AI and one of the core maintainers of the PyTorch Lightning library. He has prior experience in building production scale Machine Learning and Computer Vision systems for several products ranging from Video Analytics to Fashion AI workflows. He has also been a contributor to a few other open-source projects and aims to empower the way people and organizations build AI applications.