Why GPU Clusters Don’t Need to Go Brrr? Leverage Compound Sparsity to Achieve the Fastest Inference Performance on CPUs


Forget specialized hardware. Get GPU-class performance on your commodity CPUs with compound sparsity and sparsity-aware inference execution.
This talk will demonstrate the power of compound sparsity for model compression and inference speedup for NLP and CV domains, with a special focus on the recently popular Large Language Models. The combination of structured + unstructured pruning (to 90%+ sparsity), quantization, and knowledge distillation can be used to create models that run an order of magnitude faster than their dense counterparts, without a noticeable drop in accuracy. This key enabler allows fast inference of modern neural networks on CPUs. The session participants will learn the theory behind compound sparsity, state-of-the-art techniques, and how to apply it in practice using the Neural Magic platform.


Konstantin Gulin is a Machine Learning Engineer at Neural Magic working on bringing sparse computation to the forefront of industry. With prior experience in applying machine learning to remote sensing (NASA) and space mission simulation (The Aerospace Corporation), he’s turned his focus to enabling effective model deployment in even the most constrained environments. He’s passionate about technology and ethical engineering and strives for the thoughtful advancement of AI.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google