Logo: University of Southern California

Events Calendar


  • Long-Term Context Modeling for Acoustic- Linguistic Emotion Recognition

    Thu, Mar 11, 2010 @ 02:00 PM - 03:00 PM

    Ming Hsieh Department of Electrical and Computer Engineering

    Conferences, Lectures, & Seminars


    Abstract:
    The automatic estimation of human affect from the speech signal is an important step towards making virtual agents more natural and human-like. Thus, we present a novel technique for incremental recognition of the
    user's emotional state as it is applied in a Sensitive Artificial Listener (SAL) system, designed for socially competent human-machine communication. Our method is capable of using acoustic, linguistic, as well as long-range contextual information in order to continuously predict the current quadrant in a two-dimensional emotional space spanned by the dimensions valence and activation. The main system components are a hierarchical Dynamic Bayesian Network for detecting linguistic keyword features and Long Short-Term Memory recurrent neural networks which model phoneme context and emotional history to predict the affective state of the user. We evaluate various keyword spotting model architectures for linguistic feature generation as well as different strategies for extracting relevant acoustic features from the speech signal. Conducting experiments on the SAL corpus of non-prototypical real-life emotional speech, we obtain a quadrant prediction accuracy that is comparable to the average inter-labeler consistency.
    Bio:
    Martin Wöllmer works as a researcher funded by the European Community's Seventh Framework Programme project SEMAINE at the Technische Universität München (TUM). He obtained his bachelor degree and his
    diploma in Electrical Engineering and Information Technology from TUM for his works in the field of multimodal data fusion and robust automatic speech recognition, respectively. His current research and teaching activity includes the subject areas of pattern recognition and speech processing. Thereby his focus lies on robust keyword detection in emotionally colored and noisy speech, emotion recognition, and speech feature enhancement. Publications of his in various journals, books, and conference proceedings cover novel and robust modeling architectures for speech and emotion recognition such as Switching Linear Dynamic Models, Long Short-Term Memory recurrent neural nets, or Graphical Models.

    Location: Ronald Tutor Hall of Engineering (RTH) - 320

    Audiences: Everyone Is Invited

    Contact: Mary Francis

    Add to Google CalendarDownload ICS File for OutlookDownload iCal File

Return to Calendar