USC - Viterbi School of Engineering

May
06

CS Colloquium: Kate Saenko (University of Massachusetts Lowell) - From Video to Sentences: A Deep Learning Approach
Wed, May 06, 2015 @ 11:00 AM - 12:00 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars

Speaker: Kate Saenko , University of Massachusetts Lowell

Talk Title: From Video to Sentences: A Deep Learning Approach

Series: CS Colloquium

Abstract: I will describe several recent advances in automatic generation of natural language descriptions for video. Video description has important applications in human-robot interaction, video indexing, and describing movies for the blind. Real-world videos often have complex dynamics, but current methods are insensitive to temporal structure and do not allow both input (sequence of frames) and output (sequence of words) of variable length. I will describe a novel sequence-to-sequence neural network that learns to generate captions for brief videos. The model is trained on video-sentence pairs and is naturally able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. To further handle the ambiguity over multiple objects and locations, the model incorporates convolutional networks with Multiple Instance Learning (MIL) to consider objects in different positions and at different scales simultaneously. The multi-scale multi-instance convolutional network is integrated with a sequence-to-sequence recurrent neural network to generate sentence descriptions based on the visual representation. This architecture is the first end-to-end trainable deep neural network that is capable of multi-scale region processing for video description. I will show results of captioning YouTube videos and Hollywood movies.

Biography: Kate Saenko is an Assistant Professor of Computer Science at the University of Massachusetts Lowell. She received her PhD from MIT, followed by postdoctoral work at UC Berkeley and Harvard. Her research spans the areas of computer vision, machine learning, speech recognition, and human-robot interfaces. Dr Saenko's current research interests include domain adaptation for object recognition and joint modeling of language and vision. She is involved in a large multi-institution NSF-sponsored project, conducting research in statistical scene understanding and physics-based visual reasoning. She is also a recipient of an NSF EAGER award to analyse domain invariance of deep learning models. Previously, she was involved in DARPA's Mind's Eye project, developing methods for recognizing and describing human activities in video.

Host: Fei Sha

Location: Grace Ford Salvatori Hall Of Letters, Arts & Sciences (GFS) - 222
Audiences: Everyone Is Invited

Contact: Assistant to CS chair

This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.
Add to Google Calendar

Events Calendar

Select a calendar:

Filter May Events by Event Type:

Events for May 06, 2015

CS Colloquium: Kate Saenko (University of Massachusetts Lowell) - From Video to Sentences: A Deep Learning Approach