-
USC CS Colloquium Series
Tue, Nov 21, 2006 @ 03:30 PM - 04:50 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Dr. David ChiangComputer ScientistUSC Information Sciences InstituteTitle: Finding Structure in Statistical Machine TranslationAbstract:The introduction of data-driven methods into machine translation (MT) in the 1990s created a whole new way of doing MT, and the recent move from the word-based models developed at IBM to the phrase-based models developed by Och and others has led to a breakthrough in MT performance.
The next breakthrough, the move to syntax-based models that deal with the hierarchical, meaning-bearing, structures of sentences, is waiting in the wings. It is only recently that such models, based on synchronous context-free grammars and related formalisms, have become top contenders in large-scale evaluations such as those conducted by NIST, especially for Chinese-to-English translation. And this framework offers many avenues for potential advances.I will present Hiero, the first grammar-based MT system, to our knowledge, to outperform a phrase-based baseline when measured using the widely-used BLEU metric, and describe several related approaches. Two current challenges for this approach are: (1) how can the training and translation process be made efficient for extremely large amounts of data? (2) how can we obtain synchronous grammars that better model the structure of a parallel corpus? I will present some recent progress and future work at ISI that addresses these two questions.Biography:Dr. David Chiang has been a computer scientist at the Information Sciences Institute since January 2006. He completed his PhD at the University of Pennsylvania under the supervision of Dr. Aravind Joshi, working on formal language theory, statistical natural language processing, and computational biological sequence analysis. His current research is on using grammars and parsing for statistical machine translation.
Location: Seaver Science Library (SSL) - 150
Audiences: Everyone Is Invited
Contact: Nancy Levien