Abstract: Millions of critical real-time decisions are made each day by online Machine Learning models at Lyft to shape how riders move and how drivers earn. To enable these decisions efficiently at scale, we grappled with several technical challenges:
(1) How could we design a serving system that can perform model inferences within single digit millisecond latencies and a throughput of 1,000,000+ requests per second?
(2) How would we make such a system support model sizes from low kilobytes to gigabytes and with model update periods as fast as a couple of minutes?
(3) How can we empower 40+ teams with use-cases across fraud detection, pricing, safety, ETAs, etc. to use any modeling libraries possible so that they can ship effective models fast, with no constraints?
We built LyftLearn Serving, a scalable, flexible, distributed online model serving system to overcome these challenges. In this talk, we give an overview of the online model serving requirements at Lyft that drove us to build LyftLearn Serving. We showcase various techniques we used to tackle the aforementioned challenges to achieve a low latency, high throughput model serving system powering products of 40+ teams. We will also present design decisions we made for LyftLearn Serving for efficient versioning, deploying, testing, and monitoring ML models and describe tradeoffs that would help and inspire ML Ops practitioners while building similar systems.
Bio: Hakan is a staff software engineer in ML Platform team at Lyft. They build ML development, training and serving systems helping 40+ teams. Previously, Hakan was a staff engineer in Box. He helped build cloud content management applications focused on security and also scaled kubernetes clusters, service meshes in an on-premise infrastructure. He started his career at the hardware level, building ASICs and transitioned to distributed systems software in a startup experience. Hakan is passionate about wearing many hats, switching abstraction levels, operational excellence and mentorship, and loves challenges and solving problems that take the whole team to address.