-
NL Seminar-Beyond Parallel Data - A Decipherment Approach for Better Quality Machine Translation
Fri, Aug 14, 2015 @ 03:00 PM - 04:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Qing Dou, USC/ISI
Talk Title: A Decipherment Approach for Better Quality Machine Translation
Series: Natural Language Seminar
Abstract: Thanks to the availability of parallel data and advances in machine learning techniques, we have seen tremendous improvement in the field of machine translation over the past 20 years. However, due to lack of parallel data, the quality of machine translation is still far from satisfying for many language pairs and domains. In general, it is easier to obtain non-parallel data, and much work has tried to learn translations from non-parallel data. Nonetheless, improvements to machine translation have been limited. In this work, I follow a decipherment approach to learn translations from non parallel data and achieve significant gains in machine translation.
I apply slice sampling to Bayesian decipherment. Compared with the state- of-the-art algorithm, the new approach is highly scalable and accurate, making it possible to decipher billions of tokens with hundreds of thousands of word types at high accuracy for the first time. When it comes to deciphering foreign languages, I introduce dependency relations to address the problems of word reordering, insertion, and deletion. Experiments show that dependency relations help improve Spanish/English deciphering accuracy by over 5-fold. Moreover, this accuracy is further doubled when word embeddings are used to incorporate more contextual information.
Moreover, I decipher large amounts of monolingual data to improve the state- of-the-art machine translation systems in the scenario of domain adaptation and low density languages. Through experiments, I show that decipherment find high quality translations for out-of-vocabulary words in the task of domain adaptation, and help improve word alignment when the amount of parallel data is limited. I observe up to 3.8 point and 1.9 point BlEU gain in Spanish/French and Malagasy/English machine translation experiments respectively.
Biography: Qing is a PhD candidate at USC. His research interests focus on application of machine learning techniques to help computer better understand human languages. He is working with Kevin Knight on various problems related to Machine Translation and Decipherment. Prior to that, he has worked on computational phonology, including stress prediction and transliteration. He is interested in continuing his research in industrial settings to solve exciting large scale problems.
Host: Nima Pourdamghani and Kevin Knight
More Info: http://nlg.isi.edu/nl-seminar/
Location: Information Science Institute (ISI) - 6th Flr Conf Rm # 689, Marina Del Rey
Audiences: Everyone Is Invited
Contact: Peter Zamar
Event Link: http://nlg.isi.edu/nl-seminar/