Abstract: Weakly supervised approaches have gained popularity in the last two years, but there is still a significant amount of overhead in applying these methods to more complex NLP tasks. The performance of weakly supervised systems is contingent on both the quality and quantity of independent sources of weak signal- if a practitioner cannot come up with sufficient sources themselves then weak supervision is largely impractical.
To overcome this, we can use techniques to interactively generate candidate sources of weak supervision to guide the practitioner, making weak supervision practical for many tasks that would otherwise be difficult to support . In this tutorial, we’ll first build a basic weakly supervised system for an NLP task, and then augment it with some of these generative techniques to speed up the iterative process.
Lesson 1: Foundational Weak Supervision
Familiarize yourself with concepts from weak supervision by implementing your first weakly supervised system from scratch. Learn the primitives that go into any weakly supervised system, and build an intuition about some of the inner workings.
Lesson 2: Probabilistic weak label evaluation
Evaluating weakly supervised labels is tricky, and using probabilistic labels in conventional model architectures is even trickier. Work through a simple implementation that should help you build an understanding of how to apply these techniques to more complex tasks.
Lesson 3: Weak signal generation
Complete the loop by programmatically generating some sources of weak supervision and measure how overall label quality was affected. Learn about some potential generation techniques and where their limitations are.
Bio: Shayan Mohanty is the CEO and Co-Founder of Watchful, a company that largely automates the process of creating labeled training data. He's spent over a decade of leading data engineering teams at various companies including Facebook, where he served as lead for the stream processing team responsible for processing 100% of the ads metrics data for all FB products. He is also a Guest Scientist at Los Alamos National Laboratory and has given talks on topics ranging from Automata Theory to Machine Teaching.