Abstract: Millions of critical real-time decisions are made each day by online Machine Learning models at Lyft to shape how riders move and how drivers earn. To enable these decisions efficiently at scale, we grappled with several technical challenges:
(1) How could we design a serving system that can perform model inferences within single digit millisecond latencies and a throughput of 1,000,000+ requests per second?
(2) How would we make such a system support model sizes from low kilobytes to gigabytes and with model update periods as fast as a couple of minutes?
(3) How can we empower 40+ teams with use-cases across fraud detection, pricing, safety, ETAs, etc. to use any modeling libraries possible so that they can ship effective models fast, with no constraints?
We built LyftLearn Serving, a scalable, flexible, distributed online model serving system to overcome these challenges. In this talk, we give an overview of the online model serving requirements at Lyft that drove us to build LyftLearn Serving. We showcase various techniques we used to tackle the aforementioned challenges to achieve a low latency, high throughput model serving system powering products of 40+ teams. We will also present design decisions we made for LyftLearn Serving for efficient versioning, deploying, testing, and monitoring ML models and describe tradeoffs that would help and inspire ML Ops practitioners while building similar systems.
Bio: Mihir Mathur is the lead Product Manager for Machine Learning at Lyft, where he works on building ML/AI tools that power Lyft’s automated intelligent decisions across realtime pricing, ETAs, fraud detection, safety classification etc. In the past Mihir has worked on building delightful products for millions of users at Quora, Houzz, and Thomson Reuters and spoken about his work at conferences such as MLOps World and ODSC. Mihir graduated magna cum laude from UCLA with a Bachelor’s and Master’s in Computer Science.