Abstract: An important practical challenge is to develop theoretically-principled tools that can be used to guide the use of production-scale deep neural networks. We will describe recent work that has focused on using spectral-based methods from scientific computing and statistical mechanics to develop such tools. Among other things, these tools can be used to develop metrics characterizing the quality of models, without even examining training or test data; and they can be used to predict trends in generalization (and not just bounds on generalization) for state-of-the-art production-scale models. Related tools can be used to exploit adversarial data to characterize and modify the curvature properties of the penalty landscape and to perform tasks such as model quantization in a more automated way. We will cover basic ideas underlying these methods and illustrate their use for analyzing production-scale deep neural networks in computer vision, natural language processing, and related areas, and we will walk participants through how to use these tools, as implemented in the publicly-available "weightwatcher" python package.
Bio: Charles Martin holds a PhD in Theoretical Chemistry from the University of Chicago. He was then an NSF Postdoctoral Fellow and worked in a Theoretical Physics group at UIUC that studied the statistical mechanics of Neural Networks. He currently owns and operates Calculation Consulting, a boutique consultancy specializing in ML and AI, supporting clients doing applied research in AI. He maintains a well-recognized blog on practical ML theory and he has to date supported and performed the work on Implicit and Heavy Tailed Self Regularization in Deep Learning.