-
AI SEMINAR
Fri, Jun 07, 2013 @ 11:00 AM - 12:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Benjamin Snyder, Assistant Professor UW-Madison
Talk Title: Dead Languages, and Elbow Grease: Minimally Supervised NLP Models.
Series: AISeminar
Abstract: I will discuss new techniques for inducing accurate statistical models for low resource languages. In the extreme case of language decipherment, we are presented with a text with no knowledge of the language or writing system, and our goal is to identify the phonetic properties of the characters. I will present a Bayesian model that predicts whether letters are consonants or vowels with over 99% accuracy across 503 languages. The model assumes that languages are grouped into latent clusters with shared phonotactic regularities. We perform posterior inference over the identity and shared parameters of these clusters.
In a more common scenario, we are trying to build a model (e.g. for a low resource language) while minimizing our annotation effort. I will present a method based on matrix projections that allows us to quickly identify an optimal set of examples to label. This method outperforms active learning while obviating the need for incremental retraining and bootstrapping. We report error reductions of 25-40% on the tasks of pronunciation modeling and part-of-speech tagging.
Biography: Benjamin Snyder is an assistant professor of computer science at UW-Madison. He completed his PhD at MIT in 2010, receiving the ACM Dissertation Award honorable mention and the George M. Sprowls Award for best PhD thesis in computer science at MIT. He will be visiting ISI For all of June.
Host: David Chiang
Webcast: http://webcasterms1.isi.edu/mediasite/SilverlightPlayer/Default.aspx?peid=ec818feeeb5b458e87121185990150ef1dLocation: Information Science Institute (ISI) - 11th fl Large CR
WebCast Link: http://webcasterms1.isi.edu/mediasite/SilverlightPlayer/Default.aspx?peid=ec818feeeb5b458e87121185990150ef1d
Audiences: Everyone Is Invited