Multimodal Emotion Recognition: Quantifying Dynamics and Structure in Audio-Visual Expressive Speech
Thu, Oct 11, 2018 @ 02:00 PM - 04:00 PM
Ming Hsieh Department of Electrical and Computer Engineering
Conferences, Lectures, & Seminars
Speaker: Yelin (Lynn) Kim, Ph.D., Assistant Professor, University at Albany, SUNY
Talk Title: Multimodal Emotion Recognition: Quantifying Dynamics and Structure in Audio-Visual Expressive Speech
Abstract: The rise of AI assistant systems, including Google Home, Apple Siri, and Amazon Echo, brings the urgent need for increased and deeper understanding of users. In this talk, I will present algorithmic and statistical methods for analyzing audio-visual human behavior, particularly focusing on emotional and social signals inferred from speech and facial expressions. These methods can provide emotional intelligence to AI systems. However, developing automatic emotion recognition systems is challenging since emotional expressions are complex, dynamic, inherently multimodal, and are entangled with other factors of modulation (e.g. speech generation and emphasis). I will present several algorithms to address these fundamental challenges in emotion recognition: (i) cross-modal modeling methods that capture and control for interactions between individual facial regions and speech using the Minimum Description Length (MDL) principle-based segmentation; (ii) localization and prediction of events with salient emotional behaviors using a max-margin optimization and dynamic programming; and (iii) temporal modeling methods to learn co-occurrence patterns between emotional behaviors and emotion label noise. These algorithms have enabled advancements in the modeling of audio-visual emotion recognition systems and increased the understanding of the underlying dynamic and multimodal structure of affective communication (e.g., cross-modal interaction, temporal structure, and inherent perceptual ambiguity).
Biography: Yelin Kim [http://yelinkim.com] is an Assistant Professor in the Department of Electrical and Computer Engineering at the University at Albany, State University of New York (SUNY). She received her M.S. and Ph.D. in Electrical and Computer Engineering from the University of Michigan, Ann Arbor in 2013 and 2016, respectively, and her B.S. in Electrical and Computer Engineering from Seoul National University, South Korea in 2011. Her main research interests are in human-centered and affective computing, multimodal (audio-visual) modeling, and computational behavior analysis. Her work was recognized by several awards, including a Google Faculty Research Award (2018), a SUNY-A Faculty Research Award (2017), and the Best Student Paper Award at ACM Multimedia (2014).
Host: Dr. Shrikanth Narayanan
Audiences: Everyone Is Invited
Posted By: Tanya Acevedo-Lam/EE-Systems