Abstract: Quantization is a common technique to speedup the inference time of your model by reducing the precision of the model, for example, to int8. In this tutorial, we will introduce the basics of quantization and quantization support in PyTorch.
Lesson One: Quantization Basics (20min)
In this section, we will introduce the motivation for quantization and the common quantization techniques and the benefit we can get from quantization.
Lesson Two: PyTorch Quantization (40min)
In this section, we will introduce the available tools to quantize your PyTorch model, and when to use different ways of quantization.
Lesson Three: Numerical Debugging with Numeric Suite (20min)
What do we do if the model accuracy drops too much? How do we figure out the best configuration to quantize our model? In this section, we will introduce the tools to help debug numerical issues of a quantized model in order to get a model with acceptable accuracy loss, while giving the desired inference time speedup.
will leave 5-10min for Q&A
PyTorch, Python, deep learning
Bio: Jerry Zhang is a Software Engineer in PyTorch Architecture Optimization team under AI Frameworks org in Meta. He has been working on PyTorch Quantization for the past three years, trying to provide self-serve and easy to use tools for people to optimize the inference speed of their model while maintaining accuracy. Before Meta, he was a master's student in computer science at Carnegie Mellon University and he got his Bachelors degree in Computer Science and Technology from Zhejiang University, China.