-
AI SEMINAR-Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods
Fri, Aug 01, 2014 @ 11:00 AM - 12:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Jascha Sohl-Dickstein, Kahn Academy, Stanford University
Talk Title: Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods
Series: AISeminar
Abstract: Abstract:
I will present an algorithm for performing minibatch optimization that combines the computational efficiency of stochastic gradient descent (SGD) with the second order curvature information leveraged by quasi-Newton methods. These approaches are unified by maintaining an independent Hessian approximation for each minibatch. Each update step requires only a single minibatch evaluation (as in SGD), and each step is scaled using an approximate inverse Hessian and little to no adjustment of hyperparameters is required (as is typical for quasi-Newton methods). This algorithm is made tractable in memory and computational cost even for high dimensional optimization problems by storing and manipulating the quadratic approximations for each minibatch in a shared, time evolving, low dimensional subspace. Experimental results demonstrate improved convergence on seven diverse optimization problems. The algorithm is released as open source Python and MATLAB packages.
Optimizer available at:
https://github.com/Sohl-Dickstein/Sum-of-Functions-Optimizer
Paper reference:
Jascha Sohl-Dickstein, Ben Poole, and Surya Ganguli
Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods
International Conference on Machine Learning (2014)
http://arxiv.org/abs/1311.2115
Biography: Bio:
Jascha Sohl-Dickstein is an Academic Resident at the Khan Academy, and a visiting scholar in applied physics in Surya Ganguli's lab at Stanford University. He earned his PhD in 2012 in the Redwood Center for Theoretical Neuroscience at UC Berkeley, in Bruno Olshausen's lab. His research interests involve applying ideas from statistical physics and dynamical systems to problems in machine learning and neuroscience.
Host: Greg Ver Steeg
Webcast: http://webcasterms1.isi.edu/mediasite/Viewer/?peid=925b9f53d4eb4964a37af20bacde2ad31dLocation: Information Science Institute (ISI) - 1135
WebCast Link: http://webcasterms1.isi.edu/mediasite/Viewer/?peid=925b9f53d4eb4964a37af20bacde2ad31d
Audiences: Everyone Is Invited