-
Loss Minimization for Voice Onset Time (VOT) Measurement, Phoneme Alignment, and Phoneme Recognition
Tue, Oct 25, 2011 @ 10:30 AM - 11:30 AM
Ming Hsieh Department of Electrical and Computer Engineering
Conferences, Lectures, & Seminars
Speaker: Joseph Keshet, Ph.D., Research Assistant Professor, TTI-Chicago
Talk Title: Loss Minimization for Voice Onset Time (VOT) Measurement, Phoneme Alignment, and Phoneme Recognition
Abstract: In discriminative learning one is interested in training a system to optimize a certain desired measure of performance, or task loss. In binary classification one typically tries to minimize the error rate. But in prediction for more complex tasks, such as phoneme recognition or voice onset time (VOT) measurement, each task has its own loss. Phoneme recognition performance is measured in terms of phoneme error rate (edit distance) and VOT measurement is quantitatively assessed by the mean deviation from the manually labeled VOT. In the talk I will present two algorithms applied to VOT measurement, phoneme alignment, and phoneme recognition, where the goal is to minimize the specific loss for each task.
In the first part of the talk I will present the problem of automatic VOT measurement and define its loss. I will describe an algorithm which is based on structural support vector machines (SVMs) to minimize this loss. Applied to initial voiceless stops from four corpora (read and conversational speech), the agreement between automatic and manual measurements was found to be near human inter-judge agreement. The experimental results also show that this algorithm provides an accurate and efficient technique for large-scale phonetic analysis.
While algorithms based on structural SVMs are aimed at minimizing the task loss, they actually minimize a surrogate to the task loss, and there is no guarantee about the actual task loss. In the second part of the talk, I will describe a new theorem stating that a general learning update rule directly corresponds to the gradient of the task loss. Based on this update rule I will present a new algorithm for minimizing the unique task loss of phoneme alignment. I will present empirical results on phoneme alignment of a standard test set from the TIMIT corpus, which surpass all previously reported results on this problem. I will show how this update rule can be applied to continuous-density HMMs and will present empirical results on phoneme recognition of TIMIT, showing our approach outperforms previous results on large-margin training of HMMs.
This is joint work with Chih-Chieh Cheng, Tamir Hazan, David McAllester, Morgan Sonderegger, and Mark Stoehr.
Biography: Joseph Keshet received his B.Sc. and M.Sc. degrees in Electrical Engineering from Tel Aviv University in 1994 and 2002. He received his Ph.D. in Computer Science from the School of Computer Science and Engineering at the Hebrew University of Jerusalem in 2007. From 1995 to 2002 he was a researcher at IDF, and won the prestigious ``Israel Defense Prize'' award for outstanding research and development achievements. From 2007 to 2009 he was a post-doctoral researcher at IDIAP Research Institute in Switzerland. Since 2009 he has been a Research Assistant Professor at TTI-Chicago. He was the founder of and currently chairs the Machine Learning for Speech and Language Processing chapter of the International Speech Communication Association (ISCA), and was one of the organizers of the first Symposium on Machine Learning for Speech and Language Processing. His research interests are in speech and language processing, with a particular interest in speech recognition. His current research focuses on the design, analysis and implementation of machine learning algorithms for the domain of speech and language processing.
Host: Professor Shrikanth Narayanan
Location: Hughes Aircraft Electrical Engineering Center (EEB) - 248
Audiences: Everyone Is Invited
Contact: Mary Francis