-
EE-SYSTEMS DISTINGUISHED LECTURER SERIES-Modeling Spoken Language
Wed, Sep 15, 2004 @ 03:00 PM - 04:00 PM
Ming Hsieh Department of Electrical and Computer Engineering
Conferences, Lectures, & Seminars
DISTINGUISHED LECTURER SERIES"Modeling Spoken Language"Prof. Mari Ostendorf
Electrical Engineering, University of WashingtonGerontology Auditorium (GER 124)
Wednesday, September 15, 2004
3:00-4:00p.m.Abstract:As storage costs plummet and speech recognition technology progressively improves, it becomes feasible to think of archiving and publishing "spoken documents" that can be accessed as easily as we do online text documents. The range of potentially interesting spoken documents is vast, including records of meetings, committee hearings, news broadcasts, and call center data, as well as multi-media documents that include speech recordings. Language processing technology for spoken documents is even more critical than for text, since it is much more cumbersome to mine audio recordings than text for useful information. A key component of both speech recognition technology and many subsequent language processing technologies is statistical language modeling. Language models are used to characterize word sequences as an information source (a discrete stochastic process) that is to be decoded from noisy observations, such as acoustic features in speech recognition or words in another language in machine translation. Despite the fact that language is known to have long-distance structure, the most widely used language model is a simple n-gram or (n-1)-order Markov process, estimated from word sequence counts in data representative of the target task. In addition, performance gains in language modeling in recent years have been driven as much by data collection as by advances in representation of linguistic structure. As vast text resources are increasingly available via the web, one might argue that this trend will continue. However, spoken language can be quite different from written language, particularly for informal conversational speech, transcripts of which are not as readily available as written text. Human language can vary substantially depending on topic and register, such that the addition of mismatched text to the training set can actually hurt language modeling performance when using simple n-gram models. These observations argue for a decomposition of language at several levels, in terms of factors related to speaking style, topic, syntax and even morphology. This talk will show that leveraging larger data resources in learning models is synergistic with and not simply an alternative to representing structure in language, with examples of success stories in different languages and speech recognition tasks.Bio:Mari Ostendorf received the B.S., M.S. and Ph.D. degrees in electrical engineering from Stanford University. In 1985, she joined in the Speech Signal Processing Group at BBN Laboratories, and then became a member of the faculty of the Department of Electrical and Computer Engineering at Boston University in 1987. Since 1999, she has been a Professor of Electrical Engineering at the University of Washington, where she is an Endowed Professor of System Design Met Engineering. Her research interests are primarily in the area of statistical pattern recognition for non-stationary processes, particularly in speech processing applications, and her work has resulted in more than 140 publications. Her early work was in speech coding; more recently she has been involved in projects on both continuous speech recognition and synthesis. She has made contributions in segment-based and higher order acoustic models, data selection and transformation for language modeling, stochastic models of prosody for both recognition and synthesis, and information extraction from speech. Dr. Ostendorf is a former editor of Computer Speech and Language, was Technical Co-Chair of the HLT-NAACL 2003 conference, and has served on numerous speech and language conference boards and technical committees.Host: Prof. Shri Narayanan, x06432 ***A reception will follow the seminar at 4:00p.m.Location: Ethel Percy Andrus Gerontology Center (GER) - ontology Auditorium (GER 124)
Audiences: Everyone Is Invited
Contact: Rosine Sarafian