Logo: University of Southern California

Events Calendar


  • NL Seminar-Efficient Exploration for Dialog Policy Learning with BBQ Networks & Replay Buffer Spiking

    Fri, Sep 16, 2016 @ 01:30 PM - 02:30 PM

    Information Sciences Institute

    Conferences, Lectures, & Seminars


    Speaker: Zachary Lipton, UCSD

    Talk Title: Efficient Exploration for Dialog Policy Learning with BBQ Networks & Replay Buffer Spiking

    Series: Natural Language Seminar

    Abstract: When rewards are sparse and efficient exploration essential, deep Q learning with e greedy exploration tends to fail. This poses problems for otherwise promising domains such as task oriented dialog systems, where the primary reward signal, indicating successful completion, typically occurs only at the end of each episode but depends on the entire sequence of utterances. A poor agent encounters such successful dialogs rarely, and a random agent may never stumble upon a successful outcome in reasonable time. We present two techniques that significantly improve the efficiency of exploration for deep Q learning agents in dialog systems. First, we demonstrate that exploration by Thompson sampling, using Monte Carlo samples from a Bayes by Backprop neural network, yields marked improvement over standard DQNs with Boltzmann or e greedy exploration. Second, we show that spiking the replay buffer with a small number of successes, as are easy to harvest for dialog tasks, can make Q learning feasible when it might otherwise fail catastrophically.

    Biography: I am a graduate student in the Artificial Intelligence Group at the University of California, San Diego on leave for two quarters at Microsoft Research Redmond. I work on machine learning, focusing on deep learning methods and applications. In particular, I work on modeling sequential data with recurrent neural networks and sequential decision-making processes with deep reinforcement learning. I'm especially interested in research impacting medicine and natural language processing. Recently, in Learning to Diagnose with LSTM RNNs, we trained LSTM RNNs to accurately predict patient diagnoses using only lightly processed time series of sensor readings in the pediatric ICU. Before coming to UCSD, I completed a Bachelor of Arts with a joint major in Mathematics and Economics at Columbia University. Then, I worked in New York City as a jazz musician. I have interned with Amazon's Core Machine Learning team and Microsoft Research's Deep Learning Team.

    Host: Xing Shi and Kevin Knight

    More Info: http://nlg.isi.edu/nl-seminar/

    Location: Information Science Institute (ISI) - 6th Floor -CR # 689; ISI-Marina del Rey

    Audiences: Everyone Is Invited

    Contact: Peter Zamar

    Event Link: http://nlg.isi.edu/nl-seminar/

    Add to Google CalendarDownload ICS File for OutlookDownload iCal File

Return to Calendar