-
AI Seminar- Martha Palmer: "Annotating Resources for the Clinical Domain"
Fri, Jul 19, 2013 @ 11:00 AM - 12:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Martha Palmer, University of Colorado, Boulder
Talk Title: "Annotating Resources for the Clinical Domain"
Series: Artificial Intelligence Seminar
Abstract: In the general domain, large-scale linguistic annotation of syntactic structure and semantic labels fostered truly revolutionary advances in natural language processing systems. The availability of a similar large annotated resource for clinical language would enable equivalent progress in this domain by advancing methods development through rule-based and statistical approaches, involving a larger research community in the study of difficult NLP problems, and porting best-of-breed methodologies to healthcare.
Under the Strategic Health Advanced Research Project Area 4 (SHARP 4; www.sharpn.org) and the THYME NIH grant (1 R01 LM010090-01A1, Temporal Relation Discovery for Clinical Text, PI: Savova) 500,000 tokens of clinical narrative spread across specialties, patients, notes types and three sites (Mayo Clinic, Seattle Group Health Cooperative and Intermountain Health Care) are being annotated. Linguistic annotations comprise constituency parses, dependency parses, semantic role labels, coreference and temporal relations and are being done at the University of Colorado. In addition, domain specific entity and relation annotations are being done following UMLS guidelines jointly between Colorado, Harvard and the Mayo Clinic.
There are many challenges in porting general domain annotation schemes to the clinical domain, due to the fragmentary, informal style of the text and the domain specific terminology. It is also important to create diverse annotation datasets and to explore more efficient methodologies for porting to new domains, such as active learning. This talk will describe the status of the annotation effort, how the challenges are being addressed, the performance improvements observed in the newly trained components, and experiments with active learning for smart data selection.
Biography: Martha Palmer is a Full Professor at the University of Colorado with joint appointments in Linguistics and Computer Science and is an Institute of Cognitive Science Faculty Fellow. She recently won a Boulder Faculty Assembly 2010 Research Award and was the Director of the 2011 Linguistics Institute in Boulder, CO. Her research has been focused on trying to capture elements of the meanings of words that can comprise automatic representations of complex sentences and documents. Supervised machine learning techniques rely on vast amounts of annotated training data so she and her students are engaged in providing data with word sense tags and semantic role labels for English, Chinese, Arabic, Hindi, and Urdu, funded by DARPA, and NSF. They also train automatic sense taggers and semantic role labelers, and extract bilingual lexicons from parallel corpora. A more recent focus is the application of these methods to biomedical journal articles and clinical notes, funded by NIH. She is a co-editor for the Journal of Natural Language Engineering and for LiLT, Linguistic Issues in Language Technology, and on the CLJ Editorial Board. She is a past President of the Association for Computational Linguistics, past Chair of SIGLEX and SIGHAN.
Host: David Chiang
More Info: http://webcasterms1.isi.edu/mediasite/SilverlightPlayer/Default.aspx?peid=ad22eb4390944a439d0e1eeada255aa21d
Webcast: TBALocation: Information Science Institute (ISI) - 11th Flr Conf Rm # 1135
WebCast Link: TBA
Audiences: Everyone Is Invited
Contact: Peter Zamar
Event Link: http://webcasterms1.isi.edu/mediasite/SilverlightPlayer/Default.aspx?peid=ad22eb4390944a439d0e1eeada255aa21d