-
PhD Defense- Ashish Vaswani
Thu, Jun 12, 2014 @ 01:00 PM - 03:00 PM
Thomas Lord Department of Computer Science
University Calendar
PhD Candidate: Ashish Vaswani
Date: 12th June, 2014
Location: GFS 111
Time: 1pm
Committee:
Dr. David Chiang (Chair)
Dr. Liang Huang (Co-chair)
Dr. Kevin Knight
Dr. Jinchi Lv (Outside member)
Title: Smaller, Faster, and Accurate Models for Statistical Machine Translation
The goal of machine translation is to translate from one natural language into another using computers. The current dominant approach to machine translation, statistical machine translation (SMT), uses large amounts of training data to automatically learn to translate from the source language to target language. SMT systems typically contain three primary components: word alignment models, translation rules, and language models. These are some of the largest models in all of natural language processing, containing up to a billion parameters. Learning and employing these components pose difficult challenges of scale and generalization: using large models in statistical machine translation can slow down the translation process; learning models with so many parameters can cause them to explain the training data too well, degrading their performance at test time. In this thesis, we improve SMT by addressing these issues of scale and generalization for word alignment, learning translation grammars, and language modeling.
Word alignments, which are correspondences between pairs of source and target words, are used to derive translation grammars. Good word alignment can result in good translation rules, improving downstream translation quality. We will present an algorithm for training unsupervised word alignment models by using a prior that encourages learning smaller models, which improves both alignment and translation quality on large scale SMT experiments.
SMT systems typically model the translation process as a sequence of translation steps, each of which uses a translation rule. Most statistical machine translation systems use composed rules (rules that can be formed out of smaller rules in the grammar) to capture more context, improving translation quality. However, composition creates many more rules and large grammars, making both training and decoding inefficient. We will describe an approach that uses Markov models to capture dependencies between a minimal set of translation rules, which leads to a slimmer model, a faster decoder, yet the same translation quality as composed rules.
Good language models are important for ensuring fluency of translated sentences. Because language models are trained on very large amounts of data, in standard n-gram language models, the number of parameters can grow very quickly, making parameter learning difficult. Neural network language models (NNLMs) can capture distributions over sentences with many fewer parameters. We will present recent work on efficiently learning large-scale, large-vocabulary NNLMs. Integrating these NNLMs into a hierarchical phrase based MT decoder improves translation quality significantly.
Location: Grace Ford Salvatori Hall Of Letters, Arts & Sciences (GFS) - 111
Audiences: Everyone Is Invited
Contact: Lizsl De Leon