Abstract: Pytorch is a deep learning framework used to build AI models that accelerates the path from research prototyping to production deployment. The most recent version of PyTorch, 2.1, boasts some new features around compile, distributed, inference, export and edge. PyTorch also supports inference optimization techniques like memory efficient attention, quantization and pruning which are expected to make the popular generative AI models run efficiently (use less memory and run faster) during inference. By benchmarking popular generative AI models using the latest techniques in PyTorch, we see upto ~8.5x speedup for segment anything and ~5.6x for llama2. In this session we will deep dive into all the new developments and techniques in PyTorch and provide recommendations on how you can accelerate your models using native PyTorch code.
Learning objectives: How to leverage PyTorch to accelerate AI models
Bio: Supriya is an Engineering Manager working on PyTorch at Meta. Her team works on architecture optimization techniques like quantization, pruning as well as other core components of PyTorch 2.0 whereby enabling users to run AI models on different HW efficiently using native PyTorch. Prior to Meta, she worked as a software engineer at Nvidia on improving their GPU Architecture and accelerating AI models via TensorRT for inference. Supriya has an MS in CSE from University of Michigan, Ann Arbor and a bachelor's degree from Bits Pilani, India.