
Abstract: Real-time model serving is a crucial capability to deliver value from data science projects. Unfortunately, many existing REST endpoint implementations cannot scale for large volume and low latency applications. In particular, existing ML platforms with REST serving capabilities can fail in production because the real-time serving infrastructure was not designed to scale while maintaining performance SLAs. In this session, you will learn about: The problems with existing REST endpoints and why they are not production grade. What are the technologies you can use to build out a REST endpoint, what are their pros and cons. How to build a scalable REST Endpoint with Flask, uWSGI, and NGINX. What it looks like to deploy a REST endpoint using this technology stack in the real world. What kind of performance can you expect when using this type of infrastructure?
Bio: Lior Amar is the Principal Engineer at ParallelM where he is responsible for MCenter platform. He is an expert with 20 years’ experience in distributed systems development, low-level system programming and HPC cluster management / Linux systems. Before joining ParallelM, Lior was a government researcher working on high-performance computing (HPC). Before that, he was the Founder and CTO of Cluster Logic, a distributed systems consulting company. He has a Ph.D., and Master’s degree in Computer Science focused on distributed systems.