From Research to Production: Performant Cross-Platform ML/DNN Model Inferencing on Cloud and Edge with ONNX Runtime
From Research to Production: Performant Cross-Platform ML/DNN Model Inferencing on Cloud and Edge with ONNX Runtime


Powerful Machine Learning models trained using various frameworks such as scikit-learn, PyTorch, TensorFlow, Keras, and others can often be challenging to deploy, maintain, and performantly operationalize for latency-sensitive customer scenarios. Using the standard Open Neural Network Exchange (ONNX) model format and the open source cross-platform ONNX Runtime inference engine, these models can be scalably deployed to cloud solutions on Azure as well as local devices ranging from Windows, Mac, and Linux to various IoT hardware. Once converted to the interoperable ONNX format, the same model can be served using the cross-platform ONNX Runtime inference engine across a wide variety of technology stacks to provide maximum flexibility and reduce deployment friction.

In this workshop, we will explore the versatility of ONNX and ONNX Runtime by showcasing its usefulness in converting traditional ML scikit-learn pipelines to ONNX, applying transfer learning techniques, and exporting PyTorch-trained Deep Neural Network models to ONNX. We'll work through how these models can then be deployed to Azure as a cloud service using Azure Machine Learning services, and to Windows or Mac devices for on-device inferencing.

The production-ready ONNX Runtime is already used in many key Microsoft products and services such as Bing, Office, Windows, Cognitive Services, and more, on average realizing 2x+ performance improvements in high traffic scenarios.

ONNX Runtime supports inferencing of ONNX format models on Linux, Windows, and Mac, with Python, C, and C# APIs. The extensible architecture supports graph optimizations (node elimination, node fusions, etc.) and partitions models to run efficiently on a wide variety of hardware, leveraging custom accelerators, computation libraries, and runtimes where available. These pluggable ""execution providers"" work with CPU, GPU, FPGA, and more.

ONNX is a standard format for DNN and traditional ML models, developed by Microsoft, Facebook, and a number of other leading companies in the AI industry. The interoperable format provides data scientists with the flexibility to use their framework and tools of choice to accelerate the path from research to production. It also allows hardware partners to design optimizations for deep learning focused hardware based on a standard specification that is compatible with many frameworks.


Faith Xu is a Senior Program Manager at Microsoft on the Machine Learning Platform team, focusing on frameworks and tools. She leads efforts to enable efficient and performant productization of inferencing workflows for high volume Microsoft product and services though usage of ONNX and ONNX Runtime. She is an evangelist for adoption of the open source ONNX standard with community partners to promote an open ecosystem in AI.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google