Fri, Jul 10, 2020 @ 11:00 AM - 12:00 PM
Information Sciences Institute, USC Viterbi School of Engineering
This week's talk will be by Marjorie Freedman, a Research Team Leader from USC ISI.
Task Specific Data Annotation for COVID-19
Abstract: Information Extraction seeks to transform natural language text into structured records that capture key entities, relations, and events. Typical approaches to information extraction require annotators to construct domain-specific mark-up for each relation, event, and entity type of interest. This limits the applicability of information extraction in new domains, such as organizing scientific literature to suit the needs of COVID researchers. In this work, we explore alternative approaches to creating domain-specific annotation for new entities, relations, and events of interest. We provide annotators with tools to search for and label events of interest to COVID researchers and provide the flexibility in annotation to capture language that is suggestive, but non-definitive for the concepts of interest. We have begun annotation of several relations in the CORD19 (https://allenai.org/data/cord-19) dataset using this approach, and plan to make our results available.
Bio: Marjorie Freedman, a Research Team Lead at ISI, has degrees in linguistics and computer science from Cornell University. At ISI, she serves as PI of DARPA's AIDA, KAIROS, and ASED efforts. Under DARPA's AIDA project, her work has included tailoring speech recognition and optical character recognition systems for use in an information extraction pipeline. Also, under AIDA, she is exploring the impact of uncertainty in anaphora resolution to downstream tasks and working with vision researchers to understand and address the challenges of mapping the output of vision analytics to classic information extraction ontologies. Before joining ISI, she served as PI of IAPRA SCIL and Metaphor efforts; and as co-PI of BBN's DARPA DEFT and LORELEI efforts. As part of DEFT, she provided guidance in API development and served as the task coordinator for NIST's TAC 2014-16 Event Argument evaluations. As a part of this evaluation, she sought to identify a salient unit that could be evaluated and would be useful to downstream knowledge focused tasks. As PI of IARPA SCIL, she developed algorithms to understand the implicit social content of language, for example, identifying persuasive language in online discussion threads. Her work in information extraction has explored how to address limited training data, including fusing rule-based and learned systems, exploring alternative approaches to annotation, and measuring the impact of coreference in bootstrap learning for information extraction.
Craig Knoblock, Executive Director, USC Information Sciences Institute
Bhaskar Krishnamachari, Director, USC Viterbi Center for CPS and IoT
Audiences: Everyone Is Invited
Contact: Bhaskar Krishnamachari