Fri, Feb 25, 2022 @ 02:00 PM - 03:00 PM
Thomas Lord Department of Computer Science
Time: 2:00-3:00pm Feb 25, Friday
Committee: Haipeng Luo (chair), Rahul Jain, David Kempe, Ashutosh Nayyar, Vatsal Sharan.
Title: Online Goal-Oriented Reinforcement Learning
Abstract: Reinforcement Learning (RL) studies how an agent learns to behave optimally in an unknown environment. It has been a popular topic in both industries and academia since AlphaGo demonstrated its great potential. However, there is still a large gap between theory and practice of RL due to the strong assumptions made in theoretical RL. My research focuses on online learning in a goal-oriented Markov Decision Process model named Stochastic Shortest Path (SSP), where the learner\'s objective is to reach a goal state with the smallest possible cost. Many real applications can be modeled by SSP such as games, car navigation, and robotic manipulations. To understand the SSP model better, we first focus on establishing minimax regret bounds in various settings. Specifically, for SSP with stochastic costs, we develop a simple minimax optimal algorithm concurrent to other works; for SSP with adversarial costs, we develop efficient minimax optimal algorithms with known transition, and near-optimal algorithms with unknown transition. Next, we focus on developing practical learning algorithms for SSP from different perspectives. Specifically, we develop the first model-free algorithm, the first set of policy optimization algorithms, and improved algorithms with linear function approximation.
For future work, I plan to study SSP for more general settings and develop more practical algorithms. For example, I plan to study the non-stationary SSP where both the transition and cost functions are changing, and SSP under general function approximation. I also plan to develop parameter-free SSP algorithms under different settings.
WebCast Link: https://usc.zoom.us/j/97003272644
Audiences: Everyone Is Invited
Contact: Lizsl De Leon