
CS Distinguished Lecture: Sham Kakade (University of Washington) – SubLinear Reinforcement Learning
Tue, Mar 20, 2018 @ 04:00 PM  05:20 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Speaker: Sham Kakade, University of Washington
Talk Title: SubLinear Reinforcement Learning
Series: Computer Science Distinguished Lecture Series
Abstract: Suppose an agent is an unknown environment and seeks to maximize his/her long term future reward. We consider the basic question: does the agent need to learn an accurate model of the environment before he/she can start executing a nearoptimal long term course of actions?
Specifically, this talk will consider the problem of provably optimal reinforcement learning for (episodic) finite horizon MDPs, i.e., how an agent learns to maximize his/her (long term) reward in an uncertain environment. The talk will present a novel algorithm, the Variancereduced Upper Confidence Qlearning (vUCQ), which is the first algorithm which enjoys a regret bound that is both sublinear in the model size and that achieves optimal minimax regret. The algorithm is sublinear in that the time to achieve epsilon average regret is a number of samples that is far less than that required to learn any (nontrivial) estimate of the underlying model of the environment. The importance of sublinear algorithms is largely the motivation for algorithms such as "Qlearning" and other "modelfree" approaches.
vUCQ is a successive refinement method in which the algorithm reduces the variance in the "Qvalue" estimates and couples this estimation scheme with an upper confidence based algorithm. Technically, this coupling of these techniques is what leads to the algorithm's strong guarantees, showing that "modelfree" approaches can be optimal.
This lecture satisfies requirements for CSCI 591: Research Colloquium.
Biography: Sham Kakade is a Washington Research Foundation Data Science Chair, with a joint appointment in the Department of Statistics and the Department of Computer Science at the University of Washington.
From 20112015, I was a principal research scientist at Microsoft Research, New England. From 20102012, I was an associate professor at the Department of Statistics, Wharton, University of Pennsylvania. From 20052009, I was an assistant professor at the Toyota Technological Institute at Chicago.
I completed my PhD at the Gatsby Computational Neuroscience Unit under the supervision of Peter Dayan, and I was an undergraduate at Caltech where I obtained my BS in physics. I was a postdoc in the Computer and Information Science department at the University of Pennsylvania under the supervision of Michael Kearns.
Host: Computer Science Department
Location: Henry Salvatori Computer Science Center (SAL)  101
Audiences: Everyone Is Invited
Contact: Computer Science Department