NL Seminar-Decipherment for Universal Language Tools A case study for Unsupervised Part of Speech Induction
Fri, Aug 17, 2018 @ 03:00 PM - 04:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Ronald Cardenas, USC
Talk Title: Decipherment for Universal Language Tools A case study for Unsupervised Part of Speech Induction
Series: Natural Language Seminar
Abstract: Unsupervised Part of Speech induction can be viewed as a two-steps task. The first step infers a sequence of states, while the second step maps this sequence to an actual Part-of-Speech sequence at training or testing time. Hence, this last step requires reference tagged data, a luxury low-resource target languages might not have. In this talk, we present an alternative approach to the second step, modeling it as a decipherment problem in which the ciphered text is the sequence of states and the original text we want to recover is the POS sequence. This approach requires no reference data in the target language and allows to leverage POS sequences in much richer languages. Our experiments show that our approach benefits the most from simple strategies for inferring state sequences, such as Brown clustering. This allow our method to obtain reasonable performance in low-resource and limited-time scenarios.
Biography: Ronald Cardenas is a Master's student in the Language and Communication Technologies programme at Charles University in Prague. His research interests span morphological analysis and parsing of low-resource languages. At ISI, he works with Jonatan May on developing universal language tools.
Host: Nanyun Peng
More Info: http://nlg.isi.edu/nl-seminar/
Audiences: Everyone Is Invited
Contact: Peter Zamar