
Abstract: Implementing state-of-the-art architectures, tuning model hyperparameters, and optimizing loss functions are the fun parts of computer vision. Sexy as it may seem, behind each model that gets deployed into production are data labelers and data engineers responsible for building a high-quality training dataset that serves as the model’s input.
In this talk, I will provide an overview of DataOps for computer vision, outline the three data-related challenges that any computer vision teams have to deal with, and propose specific functions of an ideal DataOps platform to address these challenges.
Session Outline
1 - An Overview of DataOps for Computer Vision
Defining DataOps in the context of Data Analytics
Making the case of bringing DataOps for Computer Vision: (1) data is more important than models, (2) unstructured data preparation is challenging, and (3) building computer vision products is iterative
Laying out 6 DataOps principles for computer vision: (1) software engineering lifecycle, (2) continuous integration/delivery, (3) continuous testing, (4) observability, (5) data semantics, and (6) team collaboration
2 - The 3 Data-Related Challenges for Computer Vision Teams
Curate high-quality data points
Label and audit data at a massive scale
Account for data drift
3 - The 3 Specific Functions of an Ideal DataOps Platform
-Data Curation: (1) visualize massive datasets, (2) discover and retrieve data with ease, (3) curate diverse scenarios, and (4) integrate seamlessly with existing workflows and tools
-Data Annotation: (1) label large datasets instantly, (2) audit hard labels manually, and (3) detect mislabeled datapoints rapidly
-Data Observability: (1) detect when data drift happens, (2) analyze where and why drift happens, and (3) overcome drift and improve performance
Background Knowledge
- Understand the Machine Learning lifecycle development
- Know about the difference between academic ML and production ML
- Exposure to the MLOps tooling ecosystem
Bio: James Le currently runs Data Relations at Superb AI, a Series A ML data management startup. As part of his role, James executes content and partnership initiatives - while working cross-functionally with growth, product, customer success, sales, marketing, and community functions to drive Go-To-Market strategy.
Before joining Superb AI, he completed his Computer Science Master's degree at RIT, where his research thesis lies at the intersection of deep learning and recommendation systems. Outside of work, he is highly active in the broader data and ML community - writing data-centric blog posts, hosting a data-focused podcast, and teaching an online course for ML practitioners.

JAMES LE
Title
DataOps/MLOps Practitioner | AI Safety Researcher | Superb AI Inc.
