Bias is Good: Arguments for Programmatic Labeling

Abstract: 

Hand labeling, a fundamental part of human-mediated machine intelligence in today's age, is akin to scribes hand-copying books post-Gutenberg. What's more is that the process is naive, dangerous, and expensive in light of the ever-growing world of alternatives which includes semi-supervised learning, weak supervision, and active learning.

The significant issues with hand labeling include the introduction of bias (and hand labels are neither interpretable nor explainable), the prohibitive costs (both financial costs and the time of subject matter experts), and the fact that there is no such thing as gold labels (even the most well-known hand-labeled datasets have label error rates of at least 5%!).

We will explore the ways hand labeling has been negatively impacting ML solutions in production today, navigate the world of alternatives, and provide a framework for how to think about when to turn towards programmatic or manual annotation.

Bio: 

Shayan Mohanty is the CEO and Co-Founder of Watchful, a company that largely automates the process of creating labeled training data. He’s spent over a decade of leading data engineering teams at various companies including Facebook, where he served as lead for the stream processing team responsible for processing 100% of the ads metrics data for all FB products. He is also a Guest Scientist at Los Alamos National Laboratory and has given talks on topics ranging from Automata Theory to Machine Teaching.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google