USC - Viterbi School of Engineering

Apr
01

Dialocalization: Acoustic Speaker Diarization and Visual Localization as Joint Optimization Proble
Thu, Apr 01, 2010 @ 10:00 AM - 11:00 AM
Ming Hsieh Department of Electrical and Computer Engineering
Conferences, Lectures, & Seminars

Abstract:
Research in cognitive psychology suggests that the human brain is able to integrate different sensory modalities, such as sight, sound, and touch, into a perceptual experience that is coherent and unified. Experiments show that by considering input from multiple sensors, perceptual problems can be solved more robustly and even more efficiently. In computer science, however, synergistic use of data encoded for different human sensors has not yet lived up to its promise.In the talk, I present a novel multimodal approach for unsupervised speaker localization in both time and space. Using recordings from a single, low-resolution room overview camera and a single far-field microphone, a state-of-the-art audio only speaker diarization system (speaker localization in time) is extended so that both acoustic and visual models are estimated as part of a joint unsupervised optimization problem. The speaker diarization system first automatically determines the speech regions and estimates "who spoke when", then, in a second step, the visual models are used to infer the location of the speakers in the video. We call this process "dialocalization". The proposed system is able to exploit audio-visual integration to not only improve the accuracy of a state-of-the-art (audio-only) speaker diarization, but also adds visual speaker localization at little incremental engineering and computation costs. The combined algorithm has different properties, such as increased robustness, that cannot be observed in algorithms based on single modalities. The talk describes the algorithm, presents benchmarking results, explains its properties, and systematically discusses the contributions of each modality.Bio:
Dr. Gerald Friedland is a research scientist at the International Computer Science Institute (ICSI), a private research lab affiliated with the University of California Berkeley, where he leads the speaker diarization research projects. He is also the Co-PI on an NGA-funded project on Multimodal Location Detection and a member of the Executive Advisory Board of UC Berkeley's Opencast project.Until recently, he was ICSI's site manager in the EU-funded project AMIDA and the Swiss-funded IM2 project, and former co-PI on the DTO-VACE-funded project ROADMAP, all of which explore multimodal signal analysis to interpret people's behavior in meetings and videoconferences. He has published more than 80 peer-reviewed articles in conferences, journals, and books and is currently authoring a new textbook on multimedia computing together with Dr. Ramesh Jain. Dr. Friedland was program co-chair of the IEEE International Symposium on Multimedia 2008 and 2009. He co-founded the IEEE International Conference on Semantic Computing and is a proud founder and program director of the International Summer School on Semantic Computing at UC Berkeley. He is the recipient of several research and industry recognitions, among them the European Academic Software Award and the Multimedia Entrepreneur Award by the German Federal Department of Economics. Most recently, he led the team that won the ACM Multimedia Grand Challenge 2009. Dr. Friedland received his doctorate (summa cum laude) and master's degree in computer science from Freie Universitaet Berlin, Germany, in 2006 and 2002, respectively.Hosts: Professor Shrikanth Narayanan and Dr. Kyu Han
Location: Ronald Tutor Hall of Engineering (RTH) - 320
Audiences: Everyone Is Invited

Contact: Mary Francis

This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.
Add to Google Calendar

Return to Calendar

Events Calendar

Dialocalization: Acoustic Speaker Diarization and Visual Localization as Joint Optimization Proble