Select a calendar:
Filter July Events by Event Type:
Conferences, Lectures, & Seminars
Events for July
-
NL Seminar- Victor Chahuneau: " Translating into Morphologically Rich Languages with Synthetic Phrases"
Wed, Jul 10, 2013 @ 03:00 PM - 04:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Victor Chahuneau, CMU
Talk Title: "Translating into Morphologically Rich Languages with Synthetic Phrases"
Series: Natural Language Seminar
Abstract: Translation into morphologically rich languages is an important but recalcitrant problem in machine translation. When confronted with the large vocabulary sizes resulting from various morphological phenomena, the independence assumptions made by standard translation models mean that vast amounts of parallel training data (which do not generally exist) would be necessary to reliably estimate the numerous required parameters. On the other hand, previous attempts to remedy this situation have been unsatisfying either because they were highly language-dependent, or because they failed from a modeling perspective (e.g., they improved performance on long-tail types at the expense of frequent types).
We present a simple and effective approach that deals with the problem in two phases. First, a discriminative model is learned to predict inflections of target words from rich source-side annotations. Then, this model is used to create additional sentence-specific phrases that are added to a standard translation model prior to decoding. Our approach relies on morphological analysis of the target language but we show that an unsupervised Bayesian model can also be used in place of a standard supervised analyzer. We report significant improvements in translation quality when translating from English to Russian, Hebrew and Swahili.
Biography: http://victor.chahuneau.fr/
Host: Qing Dou
More Info: http://nlg.isi.edu/nl-seminar/
Location: Information Science Institute (ISI) - Conf Rm # 689, Marina Del Rey
Audiences: Everyone Is Invited
Contact: Peter Zamar
Event Link: http://nlg.isi.edu/nl-seminar/
This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor. -
NL Seminar- Daniel Bauer: "Understanding Descriptions of Visual Scenes Using Graph Grammars"
Fri, Jul 12, 2013 @ 03:00 PM - 04:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Daniel Bauer, Columbia University
Talk Title: "Understanding Descriptions of Visual Scenes Using Graph Grammars"
Series: Natural Language Seminar
Abstract: I will present work on the interpretation of descriptions of visual scenes such as âA man is sitting on a chair and using the computer’. One application of this research is the automatic generation of 3D scenes which provides a way for non-artists to create graphical content and have wide-ranging applications in entertainment and education.
The core task of text-to-scene generation involves understanding the high-level content of a description and translating it into a low-level representation representing a 3D scene as a set of relations between pre-existing 3D models. Linguistic, spatial, and world-knowledge inference is required in this process on different levels.
My talk will present VigNet, a repository of lexical- and world knowledge needed for text-to-scene generation, which is based on FrameNet. I will also describe how visual scenes can be represented as directed graphs and how information in VigNet can be encoded in Synchronous Hyperedge Replacement Grammars to enable semantic parsing and generation of a scene.
Biography: Daniel Bauer is a PhD candidate at Columbia University. His research interests include lexical and computational semantics, semantic parsing, and formal grammars in syntax and semantics. He is a co-founder of WordsEye Inc, a company that aims to make text-to-3D-scene generation available to everyone on social media. Daniel is currently an intern at ISI for the second summer in a row. He received his undergrad degree in Cognitive Science from the University of Osnabrück, Germany, and a MSc in Language Science and Technology from Saarland University
Host: Qing Dou
More Info: http://nlg.isi.edu/nl-seminar/
Location: Information Science Institute (ISI) - 11th Flr Conf Rm # 1135, Marina Del Rey
Audiences: Everyone Is Invited
Contact: Peter Zamar
Event Link: http://nlg.isi.edu/nl-seminar/
This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor. -
AI Seminar- Martha Palmer: "Annotating Resources for the Clinical Domain"
Fri, Jul 19, 2013 @ 11:00 AM - 12:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Martha Palmer, University of Colorado, Boulder
Talk Title: "Annotating Resources for the Clinical Domain"
Series: Artificial Intelligence Seminar
Abstract: In the general domain, large-scale linguistic annotation of syntactic structure and semantic labels fostered truly revolutionary advances in natural language processing systems. The availability of a similar large annotated resource for clinical language would enable equivalent progress in this domain by advancing methods development through rule-based and statistical approaches, involving a larger research community in the study of difficult NLP problems, and porting best-of-breed methodologies to healthcare.
Under the Strategic Health Advanced Research Project Area 4 (SHARP 4; www.sharpn.org) and the THYME NIH grant (1 R01 LM010090-01A1, Temporal Relation Discovery for Clinical Text, PI: Savova) 500,000 tokens of clinical narrative spread across specialties, patients, notes types and three sites (Mayo Clinic, Seattle Group Health Cooperative and Intermountain Health Care) are being annotated. Linguistic annotations comprise constituency parses, dependency parses, semantic role labels, coreference and temporal relations and are being done at the University of Colorado. In addition, domain specific entity and relation annotations are being done following UMLS guidelines jointly between Colorado, Harvard and the Mayo Clinic.
There are many challenges in porting general domain annotation schemes to the clinical domain, due to the fragmentary, informal style of the text and the domain specific terminology. It is also important to create diverse annotation datasets and to explore more efficient methodologies for porting to new domains, such as active learning. This talk will describe the status of the annotation effort, how the challenges are being addressed, the performance improvements observed in the newly trained components, and experiments with active learning for smart data selection.
Biography: Martha Palmer is a Full Professor at the University of Colorado with joint appointments in Linguistics and Computer Science and is an Institute of Cognitive Science Faculty Fellow. She recently won a Boulder Faculty Assembly 2010 Research Award and was the Director of the 2011 Linguistics Institute in Boulder, CO. Her research has been focused on trying to capture elements of the meanings of words that can comprise automatic representations of complex sentences and documents. Supervised machine learning techniques rely on vast amounts of annotated training data so she and her students are engaged in providing data with word sense tags and semantic role labels for English, Chinese, Arabic, Hindi, and Urdu, funded by DARPA, and NSF. They also train automatic sense taggers and semantic role labelers, and extract bilingual lexicons from parallel corpora. A more recent focus is the application of these methods to biomedical journal articles and clinical notes, funded by NIH. She is a co-editor for the Journal of Natural Language Engineering and for LiLT, Linguistic Issues in Language Technology, and on the CLJ Editorial Board. She is a past President of the Association for Computational Linguistics, past Chair of SIGLEX and SIGHAN.
Host: David Chiang
More Info: http://webcasterms1.isi.edu/mediasite/SilverlightPlayer/Default.aspx?peid=ad22eb4390944a439d0e1eeada255aa21d
Webcast: TBALocation: Information Science Institute (ISI) - 11th Flr Conf Rm # 1135
WebCast Link: TBA
Audiences: Everyone Is Invited
Contact: Peter Zamar
Event Link: http://webcasterms1.isi.edu/mediasite/SilverlightPlayer/Default.aspx?peid=ad22eb4390944a439d0e1eeada255aa21d
This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor. -
NL Seminar- Jackie Lee: "Bayesian Approaches to Acoustic Model and Pronunciation Lexicon Discovery"
Fri, Jul 19, 2013 @ 03:00 PM - 04:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Jackie Lee, MIT
Talk Title: "Bayesian Approaches to Acoustic Model and Pronunciation Lexicon Discovery"
Series: Natural Language Seminar
Abstract: In the first part of the talk, we investigate the problem of acoustic modeling in which prior language-specific knowledge and transcribed data are unavailable. We present an unsupervised model that simultaneously segments the speech, discovers a proper set of sub-word units (e.g., phones) and learns a Hidden Markov Model (HMM) for each induced acoustic unit. Our approach is formulated as a Dirichlet process mixture model in which each mixture is an HMM that represents a sub-word unit. We apply our model to the TIMIT corpus, and the results demonstrate that our model discovers phone units that are highly correlated with English phones as well as produces better segmentation than the state-of-the-art baselines. We test the quality of the learned acoustic models on a spoken term detection task. Compared to the baseline, our model is able to improve the detection precision of top hits by a large margin.
The creation of a pronunciation lexicon remains the most inefficient process in developing an automatic speech recognizer. In the second part of the talk, we discuss an unsupervised alternative to the conventional manual approach for creating pronunciation dictionaries. We present a hierarchical Bayesian model, which jointly discovers the phonetic inventory and the Letter-to-Sound (L2S) mapping rules in a language using only transcribed data. When tested on a corpus of spontaneous queries, our results demonstrate the superiority of the proposed joint learning scheme over its sequential counterpart, in which the latent phonetic inventory and L2S mappings are learned separately. Furthermore, the recognizers built with the automatically induced lexicon consistently outperform grapheme-based recognizers and even approach the performance of recognition systems trained using conventional supervised procedures.
Biography: http://groups.csail.mit.edu/sls/people/clee.shtml
Host: Qing Dou
More Info: http://nlg.isi.edu/nl-seminar/
Location: Information Science Institute (ISI) - 11th Flr Conf Rm # 1135, Marina Del Rey
Audiences: Everyone Is Invited
Contact: Peter Zamar
Event Link: http://nlg.isi.edu/nl-seminar/
This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.