Abstract: AI model inference on the phone is important to deliver real-time experience that requires execution locally on the device. But there are many constraints to build a scalable solution addressing all the different hardware specs and platform requirements across Android and iOS. Devices have limited storage and memory, and the platform app stores impose restrictions on the size of the app package.
ONNX Runtime Mobile is a new feature to address the needs for developers to build solutions for Mobile devices. You can build a reduced size binary package to integrate with the phone application and inference your ONNX models locally on devices.
ONNX Quantization techniques can be used to reduce the model size by converting FP32 weights to INT8. Improved performance is achieved with INT8 execution on ARM processors on the mobile phone.
Bio: Manash Goswami is a product management leader with a passion to solve problems and build products that meet end-user needs. Grown from development - coding, test & verification, systems design & architecture, to product management and business development. Delivered consumer electronics products from smartwatch to tablets. Deep understanding of the Android, Chrome & Windows eco-system. Demonstrated leadership by managing through influence across a matrixed org structure. Capable of driving new initiative/agenda internally or with customers, enable collaboration across functional teams and drive for financial results. Excel at dealing with ambiguity and managing complex, cross-group interactions. Can successfully tie long-term strategic thinking with near-term execution.