Abstract: Prediction rules in deep learning are based on a forward, recursive computation through several layers. Implicit rules go much beyond, by relying on the solution of a fixed-point equation that has to be numerically solved, in order to make the prediction. The equation involves a single n-vector x that contains the “hidden” features.
At first glance, the above model class seems very specific. Perhaps surprisingly, we can cast most known neural network architectures, including standard feedforward networks, CNNs, RNNs, and many more. Implicit rules allow for much wider classes of models, with a lot more capacity (number of parameters for a given dimension of the hidden features).
Recent work on implicit rules has demonstrated their potential. One of the thorny issues in implicit rules is well-posedness and numerical tractability: how can we guarantee that there exists a unique solution x, and if so, how can we solve for x efficiently? In standard networks, the issue is not present, since one can always express the hidden state variable in explicit form, thanks to a recursive elimination, that is, via a forward pass through the layers. For implicit models, one can derive simple conditions on the model parameters that guarantee both well-posedness and tractability.
The training problem for implicit learning can be addressed via standard unconstrained optimization methods that are popular in the deep learning community, such as stochastic gradients. However, this approach involves challenges: computing the gradients of x with respect to model parameters is not easy. In addition, we must guarantee well-posedness of the prediction rule; handling properly the corresponding constraint means we need to use constrained optimization, for example block-coordinate descent (BCD) methods. The nice aspect of BCD methods is their ability to handle a wide variety of constraints or penalties.
There are many other benefits of implicit models. In the talk, I will provide an overview of implicit learning and detail some exciting developments towards robustness, interpretability, and architecture optimization.
Bio: Laurent graduated from Ecole Polytechnique (Palaiseau, France) in 1985, and obtained his PhD in Aeronautics and Astronautics at Stanford University in 1990. Laurent joined the EECS department at Berkeley
1999, then went on leave in 2003-2006 to work for SAC Capital Management. He teaches optimization and data science in EECS and within the Masters of Financial Engineering at the Haas School of Business. Laurent's research focuses on sparse and robust optimization and applications to data science, with a focus on finance. In 2016 Laurent co-founded Kayrros S.A.S., a company that delivers physical asset information for the energy markets from various sources such as satellite imagery; in 2018 he co-founded SumUp Analytics, which provides high-speed streaming text analytics for business applications.