Abstract: Despite significant advances in interpretable machine learning in recent years, many ML models---especially deep networks---to understand and control. One promising new direction in interpretable deep learning aims to understand models by understanding their learned features and internal representation. This tutorial will survey state-of-the-art techniques for feature-level interpretability, with a focus on vision and language processing applications. We'll learn how to automatically discover and describe the function of individual neurons within deep networks, and use these descriptions to identify model failures and improve their robustness. This tutorial is targeted at learners who have experience with neural network model and are interested in gaining a deeper understanding of how they work.
Module 1: Visualizing and describing learned features. Learn about current techniques for automatically annotating neurons in deep networks with descriptions of their behavior.
Module 2: Extensions. Learn how to extend the technique from module one to describe other aspects of model behavior (e.g. directions in representation space rather than individual neurons) and generate more detailed feature descriptions.
Module 3: Applications. Learn how to use neuron labels to identify sources of bias and unreliability in trained models, and explore techniques for using labels to improve models after they have been trained.
Familiarity with deep learning basics.
Bio: Jacob Andreas is the X Consortium Assistant Professor at MIT. His research aims to build intelligent systems that can communicate effectively using language and learn from human guidance. Jacob earned his Ph.D. from UC Berkeley, his M.Phil. from Cambridge (where he studied as a Churchill scholar) and his B.S. from Columbia. As a researcher at Microsoft Semantic Machines, he founded the language generation team and helped develop core pieces of the technology that powers conversational interaction in Microsoft Outlook. He has been the recipient of Samsung's AI Researcher of the Year award, MIT's Kolokotrones teaching award, and paper awards at NAACL and ICML.