Reckoning with the Disagreement Problem: Post-Hoc Explanation Agreement as a Training Objective

Abstract: 

As neural networks increasingly make critical decisions in high-stakes settings, monitoring and explaining their behavior in an understandable and trustworthy manner has become a necessity. One commonly used type of explainer is post hoc feature attribution, which is a family of different methods of giving to each feature in a model’s input a score corresponding to the feature’s influence on the model’s output. A major limitation of this family of explainers in practice is that they can disagree on which features are more important than others. Our contribution in this paper is a method of training models with this disagreement problem in mind. We do this by including in the loss function, alongside the standard term corresponding to model performance, an additional term that measures the difference in feature attribution between a pair of explainers. We observe in our experiments on three datasets that models trained with this loss term can see improved explanation consensus on unseen data and on explainers that were not explicitly trained to agree. However, this improved consensus comes with a cost to model performance. Finally, we study how our method influences model outputs and explanations.

Bio: 

Avi Schwarzschild is a research fellow at Arthur and a fifth-year PhD student in the Applied Math and Scientific Computation program at the University of Maryland. His work at Arthur focuses on explainability tools for neural networks. At the University of Maryland, he is advised by Tom Goldstein on his work in deep learning. His general interests range from security to generalization and interpretability and he is trying to expand our understanding of when and why neural networks work.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google