Abstract: Supervised machine learning requires large labeled datasets - a prohibitive limitation in many real world applications. What if machines could learn with fewer labeled examples? This talk focuses on two enabling techniques. First, we explore an algorithmic solution that relies on collaboration between humans and machines to label smartly. Second, we dig into recent advancements that enable machines to understand language even when there is limited labeled data.
Being able to teach machines with examples is a powerful capability, but it hinges on the availability of vast amounts of data. The data not only needs to exist, but has to be in a form that allows relationships between input features and output to be uncovered. Creating labels for each input feature fulfills this requirement, but is an expensive undertaking.
One classical approach to this problem relies on human and machine collaboration. With this approach, engineered heuristics are used to smartly select the “best” instances of data to label, in order to reduce cost. A human steps in to provide the label. The model then learns from this smaller labeled dataset. Recent advancements have made these heuristics amenable to deep learning, enabling models to be built with limited labeled data.
Another approach relies on building baseline models that can be reused for other applications which only have limited labeled data. Until recently, this has been a challenge for languages.
In this talk, we explore algorithmic approaches that drive both capabilities, and provide practical guidance for translating these capabilities into production. We provide intuition for how and why these algorithms work by demoing and describing how we built a working prototype.
Bio: Shioulin Sam is a research engineer at Cloudera Fast Forward Labs. In her previous life, she was an angel investor focusing on women-led start-ups. She also worked in the investment management industry designing quantitative trading strategies. She holds a Ph.D in Electrical Engineering and Computer Science from Massachusetts Institute of Technology.