NL Seminar-1. IMPROVING LOW RESOURCE NEURAL MACHINE TRANSLATION 2. LANGUAGE-INDEPENDENT TRANSLATION OF OUT OF VOCABULARY WORDS
Fri, Sep 08, 2017 @ 03:00 PM - 04:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Nelson Liu and Leon Cheung, USC/ISI
Talk Title: 1. IMPROVING LOW RESOURCE NEURAL MACHINE TRANSLATION 2. LANGUAGE-INDEPENDENT TRANSLATION OF OUT OF VOCABULARY WORDS
Series: Natural Language Seminar
Abstract: 1. Statistical models have outperformed neural models in machine translation, until recently, with the introduction of the sequence to sequence neural model. However, this model's performance suffers greatly when starved of bilingual parallel data. This talk will discuss several strategies that try to overcome this low resource challenge, including modifications to the sequence to sequence model, transfer learning, data augmentation, and the use of monolingual data.
2. Neural machine translation is effective for language pairs with large datasets, but falls short to traditional methods e.g. phrase or syntax-based machine translation in the low resource setting. However, these classic approaches struggle to translate out of vocabulary tokens, a limitation that is amplified when there is little training data. In this work, we augment a syntax-based machine translation system with a module that provides translations of out of vocabulary tokens. We present several language-independent strategies for translation of unknown tokens, and benchmark their accuracy on an intrinsic out of vocabulary translation task across a typologically diverse dataset of sixteen languages. Lastly, we explore the effects of using the module to add rules to a syntax-based machine translation system on overall translation quality.
Biography: Leon Cheung is a second year undergraduate from UC San Diego. This summer he has been working with Jon May and Kevin Knight to improve neural machine translation for low resource languages.
Nelson Liu is an undergraduate at the University of Washington, where he works with Professor Noah Smith. His research interests lie at the intersection of machine learning and natural language processing. Previously, he worked at the Allen Institute for Artificial Intelligence on machine comprehension. He is currently a summer intern at ISI working with Professors Kevin Knight and Jonathan May.
Host: Marjan Ghazvininejad and Kevin Knight
More Info: http://nlg.isi.edu/nl-seminar/
Location: Information Science Institute (ISI) - 11th Flr Conf Rm # 1135, Marina Del Rey
Audiences: Everyone Is Invited
Contact: Peter Zamar
Event Link: http://nlg.isi.edu/nl-seminar/