Select a calendar:
Filter March Events by Event Type:
Events for March 31, 2021
-
Computer Science General Faculty Meeting
Wed, Mar 31, 2021 @ 12:00 PM - 02:00 PM
Thomas Lord Department of Computer Science
Receptions & Special Events
Bi-Weekly regular faculty meeting for invited full-time Computer Science faculty only. Event details emailed directly to attendees.
Location: TBD
Audiences: Invited Faculty Only
Contact: Assistant to CS chair
This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor. -
Undergraduate Advisement Drop-in Hours
Wed, Mar 31, 2021 @ 01:30 PM - 02:30 PM
Thomas Lord Department of Computer Science
Workshops & Infosessions
Do you have a quick question? The CS advisement team will be available for drop-in live chat advisement for declared undergraduate students in our four majors during the spring semester on Tuesdays, Wednesdays, and Thursdays from 1:30pm to 2:30pm Pacific Time. Access the live chat on our website at: https://www.cs.usc.edu/chat/
Location: Online
Audiences: Undergrad
Contact: USC Computer Science
This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor. -
Thesis Proposal - Chi Zhang
Wed, Mar 31, 2021 @ 03:00 PM - 04:00 PM
Thomas Lord Department of Computer Science
University Calendar
Title:
Safe Reinforcement Learning via Offline Learning
Committee:
Viktor Prasanna
Bistra Dilknia
Pau Bodgan
Ashutosh Nayyar
Jyo Deshmukh
Kannan
Abstract:
Reinforcement Learning (RL) is a general learning paradigm to solve sequential decision making problems. They are often modeled as Markov Decision Process (MDP) or Partially Observable Markov Decision Process (POMDP). Reinforcement learning aims at learning policies that maximize the expected accumulated rewards with unknown dynamics or transition probabilities. Deep reinforcement learning (DRL) refers to using deep neural networks as a general function approximator when applying RL algorithms.
Despite recent success of RL algorithms in robotics, games (e.g. AlphaGo), RL algorithms pose particular challenges when applied to real world settings.
First, it often requires sufficient exploration effort to achieve a reasonable performance; such exploration is either too expensive (e.g. it takes time to gather data in real world) or forbidden due to safety constraints.
This limits the RL algorithms in the scenarios where an accurate simulator is available.
In this proposal, we focus on developing reinforcement learning algorithms that can ensure safety during the training phase and the deployment phase. We argue that by leveraging offline learning from a static dataset collected by existing safe policies, safety can be guaranteed.
However, standard off-policy RL algorithms are prone to overestimations of the values of out-of-distribution (OOD) actions. This may cause the learned policies to visit unexplored and unsafe states at deployment phase. To mitigate this issue, we first mathematically show that by constraining the learned policies within the support set of the offline datasets, the state di stribution of the learned policy also lies within the support set of the offline datasets; hence safety is guaranteed.
To constrain the learned policies within the support set, we propose i) distribution matching, and ii) model-based OOD actions generalization detection.
We improve the existing state-of-the-art behavior regularization based approaches and propose BRAC+: Improved Behavior Regularized Actor Critic. We propose two key improvements including an analytical upper bound for the KL divergence as the behavior regularizor to reduce variance associated with sample based estimations, and gradient penalized Q update to avoid out-of-distribution (OOD) actions due to the unbounded gradient of the Q value w.r.t the OOD actions. Distribution matching is too conservative when the dataset is diverse so that the outcomes of the OOD actions can be correctly predicted. We propose to learn the inverse dynamics model as a variational auto-encoder along with the forward dynamics model. We detect OOD actions generalization by the agreement of the both models. Our approach will be evaluated on several benchmarks as well as a simulated building HVAC control testbed. We will gauge the success of our work by i) Whether the safety criteria is met. ii) The performance improvement over existing safe policies used to collect the dataset.
Zoom Link:
https://usc.zoom.us/j/2488070010
WebCast Link: https://usc.zoom.us/j/2488070010
Audiences: Everyone Is Invited
Contact: Lizsl De Leon
This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.