Abstract: As Deep Learning and AI use grows, complexity and size of the models grows. Training large models such as GPT-2 and Megatron, among others, has been a daunting task. Several distributed computing frameworks are available to address these tasks, and the oldest and most resilient is the OpenMPI (open message passing interface) library.
OpenMPI is used for high-performance computing at supercomputing centers as part of distributed computing systems. In the first half of this workshop, we will get to know more about OpenMPI and work with it hands-on using a python interface. The second half of the workshop will focus on using DeepSpeed on OpenMPI and work through a few examples. Examples for MPI basics will include inferring pi using distributed computing and running python scripts using OpenMPI.
In 2020 the library, DeepSpeed, was developed to train huge models using OpenMPI. DeepSpeed is a flexible library that can aid in hyperparameter tuning and transfer learning and substantially integrates with the transformers library. For training large sequence-to-sequence models, DeepSpeed is at the top of the heap. Examples we will go over using DeepSpeed include translation from English to Romanian, a transfer learning example for the summarization of the CNN/Daily Mail dataset, and a proteomics example.
We will discuss the benefits of using OpenMPI and DeepSpeed and when not to use them. We present other examples of distributed computing and compare them to MPI with DeepSpeed.
Little to no knowledge of cpp, intermediate knowledge of python, No familiarity with DeepSpeed is required, Some exposure to neural networks
Jennifer Dawn Davis
Staff Field Data Scientist | Domino Data Lab