-
New Temporal-Difference Methods Based on Gradient Descent
Wed, Feb 18, 2009 @ 04:00 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Speaker: Prof. Richard Sutton, University of Alberta
Host: Prof. Stefan Schaal Abstract:
Temporal-difference methods based on gradient descent and parameterized function approximators form a core part of the modern field of reinforcement learning and are essential to many of its large-scale applications. However, the most popular methods, including TD(lambda), Q-learning, and Sarsa, are not true gradient-descent methods and, as a result, the conditions under which they converge are narrower and less robust than can usually be guaranteed for gradient-descent methods. In this paper we introduce a new family of temporal-difference algorithms whose expected updates are in the direction of the gradient of a natural performance measure that we call the "mean squared projected Bellman error". Because these are true gradient-descent methods, we are able to apply standard techniques to prove them convergent and stable under general conditions including, for the first time, off-policy training. The new methods are of the same order of complexity as TD(lambda) and, when TD(lambda) converges, they converge at a similar rate to the same fixpoints. The new methods are similar to GTD(0) (Sutton, Szepesvari & Maei, 2009), but based on a different objective function and much more efficient, as we demonstrate in a series of computational experiments. (this is joint work with Hamid Maei, Doina Precup, Csaba Szepesvari, Shalabh Bhatnagar, David Silver, and Eric Wiewiora) Biography:
Richard S. Sutton is a professor and iCORE chair in the department of computing science at the University of Alberta. He is a fellow of the Association for the Advancement of Artificial Intelligence and co-author of the textbook Reinforcement Learning: An Introduction from MIT Press. Before joining the University of Alberta in 2003, he worked in industry at AT&T and GTE Labs, and in academia at the University of Massachusetts. He received a PhD in computer science from the University of Massachusetts in 1984 and a BA in psychology from Stanford University in 1978. Rich's research interests center on the learning problems facing a decision-maker interacting with its environment, which he sees as central to artificial intelligence. He is also interested in animal learning psychology, in connectionist networks, and generally in systems that continually improve their representations and models of the world.Location: Ronald Tutor Hall of Engineering (RTH) - 406
Audiences: Everyone Is Invited
Contact: CS Colloquia