Abstract: While linear algebra based optimization algorithms have been widely-used in both machine learning and data analysis as well as in scientific computing, until recently there was an important difference: algorithms in scientific computing tended to be compute-intensive, while algorithms in machine learning and data analysis tended to be data-intensive. In particular, the latter did not stress-test many of the advanced optimization techniques developed in scientific computing over the past several decades. Recently, driven by deep learning applications, but also of interest in applications where one wants more control over the output of algorithms, that has begun to change. Here, we will describe recent work in developing numerically-intensive methods for large-scale machine learning. Among other things, we'll discuss large-scale low-rank matrix computations, both in as well as not in frameworks such as Apache Spark, in comparison to lower-level frameworks such as MPI, our efforts to get the "best of both worlds", and how these methods might be used for better distributed optimization algorithms.
Bio: Michael Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received him PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council's Committee on the Analysis of Massive Data, he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets, and he spent fall 2013 at UC Berkeley co-organizing the Simons Foundation's program on the Theoretical Foundations of Big Data Analysis.