-
PhD Dissertation Defense - Aniruddh Puranic
Tue, Mar 26, 2024 @ 03:00 PM - 05:00 PM
Thomas Lord Department of Computer Science
University Calendar
PhD Dissertation Defense - Aniruddh Puranic Committee: Jyotirmoy V. Deshmukh (Chair), Gaurav Sukhatme, Stefanos Nikolaidis, and Stephen Tu Title: Sample-Efficient and Robust Neurosymbolic Learning from Demonstrations Abstract: Learning-from-demonstrations (LfD) is a popular paradigm to obtain effective robot control policies for complex tasks via reinforcement learning (RL) without the need to explicitly design reward functions. However, it is susceptible to imperfections in demonstrations and also raises concerns of safety and interpretability in the learned control policies. To address these issues, this thesis develops a neurosymbolic learning framework which is a hybrid method that integrates neural network-based learning with symbolic (e.g., rule, logic, graph) reasoning to leverage the strengths of both approaches. Specifically, this framework uses Signal Temporal Logic (STL) to express high-level robotic tasks and its quantitative semantics to evaluate and rank the quality of demonstrations. Temporal logic-based specifications allow us to create non-Markovian rewards and are also capable of defining interesting causal dependencies between tasks such as sequential task specifications. This dissertation presents the LfD-STL framework that learns from even suboptimal/imperfect demonstrations and STL specifications to infer reward functions; these reward functions can then be used by reinforcement learning algorithms to obtain control policies. Experimental evaluations on several diverse set of environments show that the additional information in the form of formally specified task objectives allows the framework to outperform prior state-of-the-art LfD methods. Many real-world robotic tasks consist of multiple objectives (specifications), some of which may be inherently competitive, thus prompting the need for deliberate trade-offs. This dissertation then further extends the LfD-STL framework by a developing metric - performance graph - which is a directed graph that utilizes the quality of demonstrations to provide intuitive explanations about the performance and trade-offs of demonstrated behaviors. This performance graph also offers concise insights into the learning process of the RL agent, thereby enhancing interpretability, as corroborated by a user study. Finally, the thesis discusses how the performance graphs can be used as an optimization objective to guide RL agents to potentially learn policies that perform better than the (imperfect) demonstrators via apprenticeship learning (AL). The theoretical machinery developed for the AL-STL framework examines the guarantees on safety and performance of RL agents. https://usc.zoom.us/j/98964159897?pwd=a2ljaGNEOGcvMkl1WU9yZENPc0M1dz09
Location: Ronald Tutor Hall of Engineering (RTH) - 306
Audiences: Everyone Is Invited
Contact: Aniruddh Puranic
Event Link: https://usc.zoom.us/j/98964159897?pwd=a2ljaGNEOGcvMkl1WU9yZENPc0M1dz09