Logo: University of Southern California

Events Calendar


  • CS Colloquium: Kate Saenko (University of Massachusetts Lowell) - From Video to Sentences: A Deep Learning Approach

    Wed, May 06, 2015 @ 11:00 AM - 12:00 PM

    Thomas Lord Department of Computer Science

    Conferences, Lectures, & Seminars


    Speaker: Kate Saenko , University of Massachusetts Lowell

    Talk Title: From Video to Sentences: A Deep Learning Approach

    Series: CS Colloquium

    Abstract: I will describe several recent advances in automatic generation of natural language descriptions for video. Video description has important applications in human-robot interaction, video indexing, and describing movies for the blind. Real-world videos often have complex dynamics, but current methods are insensitive to temporal structure and do not allow both input (sequence of frames) and output (sequence of words) of variable length. I will describe a novel sequence-to-sequence neural network that learns to generate captions for brief videos. The model is trained on video-sentence pairs and is naturally able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. To further handle the ambiguity over multiple objects and locations, the model incorporates convolutional networks with Multiple Instance Learning (MIL) to consider objects in different positions and at different scales simultaneously. The multi-scale multi-instance convolutional network is integrated with a sequence-to-sequence recurrent neural network to generate sentence descriptions based on the visual representation. This architecture is the first end-to-end trainable deep neural network that is capable of multi-scale region processing for video description. I will show results of captioning YouTube videos and Hollywood movies.

    Biography: Kate Saenko is an Assistant Professor of Computer Science at the University of Massachusetts Lowell. She received her PhD from MIT, followed by postdoctoral work at UC Berkeley and Harvard. Her research spans the areas of computer vision, machine learning, speech recognition, and human-robot interfaces. Dr Saenko's current research interests include domain adaptation for object recognition and joint modeling of language and vision. She is involved in a large multi-institution NSF-sponsored project, conducting research in statistical scene understanding and physics-based visual reasoning. She is also a recipient of an NSF EAGER award to analyse domain invariance of deep learning models. Previously, she was involved in DARPA's Mind's Eye project, developing methods for recognizing and describing human activities in video.

    Host: Fei Sha

    Location: Grace Ford Salvatori Hall Of Letters, Arts & Sciences (GFS) - 222

    Audiences: Everyone Is Invited

    Contact: Assistant to CS chair

    Add to Google CalendarDownload ICS File for OutlookDownload iCal File

Return to Calendar