
Abstract: On-device machine learning models refer to models that are trained and deployed directly on a mobile device or an edge device, rather than in the cloud. This approach enables the device to perform complex tasks, such as image recognition or natural language processing in real-time with great accuracy and consistently producing comparable results. For example, a beauty filter from your favourite video app that can apply specific to each user' facial features is actually a facial recognition model that runs on your mobile phone. By running machine learning algorithms on-device, users can avoid uploading sensitive data to the cloud, have more autonomy over their personal information, and benefit from real-time decision-making process, all of which are topics that are becoming more and more important with fast advancement of AI. In conclusion, on-device machine learning models have gained attention in recent years due to the increasing computing power of mobile and edge devices, as well as the growing demand for privacy-preserving AI applications.
However, there are still many limitations associated with on-device machine learning systems. Edge devices usually have limited computational resources, such as CPU and GPU. This can make it challenging to train and run complex machine learning models on devices and produce satisfying results in real-time. Also, due to limited storage and battery life, the large amounts of power consumption of running such ML models also means there is a strict limitation on how the models can be used. Furthermore, due to aforementioned problems, on-device ML models still struggles to achieve similar performance compared to models trained a dedicated computing resources such as cloud. This talk serves as an intro-level course on what the latest advancements of the on-device machine learning models, especially the end-to-end(E2E) ASR systems, are.
Session Outline:
Attendees of "ML on-device: building efficient models" will gain a deep understanding of methods for developing efficient machine learning models on edge devices and the importance of model optimization to ensure real-time, low-latency inference without compromising too much model performance. The talk will also cover various common model compression techniques such as quantization, and pruning. After the talk, the attendees will be equipped with the necessary knowledge to understand how to build efficient on-device machine learning models in different domains.
Background Knowledge:
PyTorch, PySpeech
Bio: Danni Li is an AI Resident at Meta. She is interested in building efficient AI systems and applications to solve real-world problems. Her current research focuses on on-device ASR models and optimization techniques.