Abstract: AI model inference on the phone is important to deliver real-time experience that requires execution locally on the device. But there are many constraints to build a scalable solution addressing all the different hardware specs and platform requirements across Android and iOS. Devices have limited storage and memory, and the platform app stores impose restrictions on the size of the app package.
ONNX Runtime Mobile is a new feature to address the needs for developers to build solutions for Mobile devices. You can build a reduced size binary package to integrate with the phone application and inference your ONNX models locally on devices.
ONNX Quantization techniques can be used to reduce the model size by converting FP32 weights to INT8. Improved performance is achieved with INT8 execution on ARM processors on the mobile phone.
Bio: Emma Ning is a senior Product manager in AI Framework team under Microsoft Cloud + AI Group, focusing on AI model operationalization and acceleration with ONNX/ONNX Runtime for open and interoperable AI. She has more than five years of product experience in search engine taking advantage of machine learning techniques and spent more than three years exploring AI adoption among various businesses. She is passionate about bringing AI solutions to solve business problems as well as enhancing product experience.