MASCLE Machine Learning Seminar: Joan Bruna (NYU) - On (Provably) Learning with Large Neural Networks
Thu, Oct 31, 2019 @ 03:30 PM - 04:50 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Speaker: Joan Bruna, New York University
Talk Title: On (Provably) Learning with Large Neural Networks
Series: Machine Learning Seminar Series hosted by USC Machine Learning Center
Abstract: Virtually all modern deep learning systems are trained with some form of local descent algorithm over a high-dimensional parameter space. Despite its apparent simplicity, the mathematical picture of the resulting setup contains several mysteries that combine statistics, approximation theory and optimization, all intertwined in a curse of dimensionality.
In order to make progress, authors have focused in the so-called 'overparametrised' regime, which studies asymptotic properties of the algorithm as the number of neurons grows. In particular, neural networks with a large number of parameters admit a mean-field description, which has recently served as a theoretical explanation for its favorable training properties. In this regime, gradient descent obeys a deterministic partial differential equation (PDE) that converges to a globally optimal solution for networks with a single hidden layer under appropriate assumptions.
In this talk, we will review recent progress on this problem, and argue that such framework might provide crucial robustness against the curse of dimensionality. First, we will describe a non-local mass transport dynamics that leads to a modified PDE with the same minimizer, that can be implemented as a stochastic neuronal birth-death process, and such that it provably accelerates the rate of convergence in the mean-field limit. Next, such dynamics fit naturally within the framework of total-variation regularization, which following [Bach'17] have fundamental advantages in the high-dimensional regime. We will discuss a unified framework that controls both optimization, approximation and generalisation errors using large deviation principles, and discuss current open problems in this research direction.
Joint work with G. Rotskoff (NYU), Z. Chen (NYU), S. Jelassi (NYU) and E. Vanden-Eijnden (NYU).
This lecture satisfies requirements for CSCI 591: Research Colloquium.
Biography: Joan Bruna is an Assistant Professor at Courant Institute, New York University (NYU), in the Department of Computer Science, Department of Mathematics (affiliated) and the Center for Data Science, since Fall 2016. He belongs to the CILVR group and to the Math and Data groups. From 2015 to 2016, he was Assistant Professor of Statistics at UC Berkeley and part of BAIR (Berkeley AI Research). Before that, he worked at FAIR (Facebook AI Research) in New York. Prior to that, he was a postdoctoral researcher at Courant Institute, NYU. He completed his PhD in 2013 at Ecole Polytechnique, France. Before his PhD he was a Research Engineer at a semi-conductor company, developing real-time video processing algorithms. Even before that, he did a MsC at Ecole Normale Superieure de Cachan in Applied Mathematics (MVA) and a BA and MS at UPC (Universitat Politecnica de Catalunya, Barcelona) in both Mathematics and Telecommunication Engineering. For his research contributions, he has been awarded a Sloan Research Fellowship (2018), a NSF CAREER Award (2019) and a best paper award at ICMLA (2018).
Host: Yan Liu
Audiences: Everyone Is Invited
Contact: Computer Science Department