DataOps For the Modern Computer Vision Stack


Implementing state-of-the-art architectures, tuning model hyperparameters, and optimizing loss functions are the fun parts of computer vision. Sexy as it may seem, behind each model that gets deployed into production are data labelers and data engineers responsible for building a high-quality training dataset that serves as the model’s input.

In this talk, I will provide an overview of DataOps for computer vision, outline the three data-related challenges that any computer vision teams have to deal with, and propose specific functions of an ideal DataOps platform to address these challenges.

Session Outline
1 - An Overview of DataOps for Computer Vision

Defining DataOps in the context of Data Analytics
Making the case of bringing DataOps for Computer Vision: (1) data is more important than models, (2) unstructured data preparation is challenging, and (3) building computer vision products is iterative
Laying out 6 DataOps principles for computer vision: (1) software engineering lifecycle, (2) continuous integration/delivery, (3) continuous testing, (4) observability, (5) data semantics, and (6) team collaboration

2 - The 3 Data-Related Challenges for Computer Vision Teams

Curate high-quality data points
Label and audit data at a massive scale
Account for data drift

3 - The 3 Specific Functions of an Ideal DataOps Platform

-Data Curation: (1) visualize massive datasets, (2) discover and retrieve data with ease, (3) curate diverse scenarios, and (4) integrate seamlessly with existing workflows and tools
-Data Annotation: (1) label large datasets instantly, (2) audit hard labels manually, and (3) detect mislabeled datapoints rapidly
-Data Observability: (1) detect when data drift happens, (2) analyze where and why drift happens, and (3) overcome drift and improve performance

Background Knowledge
- Understand the Machine Learning lifecycle development
- Know about the difference between academic ML and production ML
- Exposure to the MLOps tooling ecosystem


James Le currently runs Data Relations at Superb AI, a Series A ML data management startup. As part of his role, James executes content and partnership initiatives - while working cross-functionally with growth, product, customer success, sales, marketing, and community functions to drive Go-To-Market strategy.

Before joining Superb AI, he completed his Computer Science Master's degree at RIT, where his research thesis lies at the intersection of deep learning and recommendation systems. Outside of work, he is highly active in the broader data and ML community - writing data-centric blog posts, hosting a data-focused podcast, and teaching an online course for ML practitioners.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google