Abstract: As data pipelines become increasingly complex and data volumes grow at a dizzying rate, maintaining observability into your data and model health is more critical than ever. Each phase of a data pipeline introduces potential points of failure which are notoriously difficult and time consuming to detect and troubleshoot. To make matters more challenging, the world around us is constantly changing, and so is our data. Without proper observability into your data and model health, models can fail without warning, resulting in large costs to your business. Worse yet, they can fail silently and you won’t even know. With the rise of MLOps, researchers and ML practitioners have joined forces to develop methodologies for monitoring these systems for things like data drift, concept drift, and data quality degradation. However, this problem remains a unique one in the domain of computer vision.
In this talk, Ray Reed will explore the plethora of problems impacting data and model health for computer vision systems specifically, and construct an approach for detecting, debugging, and reacting to these issues at scale. In the case of tabular data, powerful monitoring capabilities can be achieved by collecting telemetric data such as missing value ratios, cardinality of discrete features, descriptive statistics, etc. For image and video data, however, metrics of interest must be derived.
Ray will discuss the types of metrics which must be tracked in order to answer the most important questions about image and video data health, and to successfully debug elusive problems faced in the computer vision domain. Lastly, he will discuss how proper visualization and anomaly detection can be introduced to dramatically reduce the time spent on these monitoring and debugging tasks, and to maintain observability into your computer vision system’s health at all times.
Bio: Ray is a Customer Success Data Scientist at WhyLabs, the AI Observability company. He has a long held passion for machine learning and loves helping customers save time and money by monitoring their ML systems at scale. Ray was formerly a Senior Success Engineer at Datorama, a Salesforce Company, where he drove success for large enterprise customers with a focus on improving query performance across the company. With his spare time, Ray enjoys hiking, music, and more hiking.