Logo: University of Southern California

Events Calendar


  • NL Seminar-Learning Open Domain Knowledge From Text

    Fri, Jan 15, 2016 @ 03:00 PM - 04:00 PM

    Information Sciences Institute

    Conferences, Lectures, & Seminars


    Speaker: Gabor Angeli, Stanford University

    Talk Title: Learning Open Domain Knowledge From Text

    Series: Natural Language Seminar

    Abstract: The increasing availability of large text corpora holds the promise of acquiring an unprecedented amount of knowledge from this text. However, current techniques are either specialized to particular domains or do not scale to large corpora. This dissertation develops a new technique for learning open-domain knowledge from unstructured web-scale text corpora. A first application aims to capture common sense facts: given a candidate statement about the world and a large corpus of known facts, is the statement likely to be true? We appeal to a probabilistic relaxation of natural logic -- a logic which uses the syntax of natural language as its logical formalism -- to define a search problem from the query statement to its appropriate support in the knowledge base over valid (or approximately valid) logical inference steps. We show a 4x improvement at retrieval recall compared to lemmatized lookup, maintaining above 90% precision. This approach is extended to handle longer, more complex premises by segmenting these utterance into a set of atomic statements entailed through natural logic. We evaluate this system in isolation by using it as the main component in an Open Information Extraction system, and show that it achieves a 3% absolute improvement in F1 compared to prior work on a competitive knowledge base population task. A remaining challenge is elegantly handling cases where we could not find a supporting premise for our query. To address this, we create an analogue of an evaluation function in game playing search: a shallow lexical classifier is folded into the search program to serve as a heuristic function to assess how likely we would have been to find a premise. Results on answering 4th grade science questions show that this method improves over both the classifier in isolation and a strong IR baseline, and achieves the best published results on the task.



    Biography: Gabor is a new graduate from Chris Manning's natural language processing lab. He graduated with a BS in electrical engineering/computer science from UC Berkeley in 2010, and defended his Ph.D. in the fall of 2015. His research focuses on natural language understanding, ranging from relation extraction and knowledge base population, textual entailment, common-sense reasoning, and question answering. He has led the Stanford knowledge base population project for the past three years, with Stanford ranking 5th, 1st, and 1st (tied) among teams participating in the TAC-KBP competition over those three years. In addition to publications at ACL, EMNLP and NAACL, he co-authored an EMNLP best dataset paper on collecting a large dataset for textual entailment. Outside of academia, he was the NLP architect for Baarzo in 2014 (acquired by Google), and is currently a fellow at XSeed Capital. In his free time, Gabor enjoys hiking, board games, and binge-watching Netflix shows.


    Host: Xing Shi and Kevin Knight

    More Info: http://nlg.isi.edu/nl-seminar/

    Location: Information Science Institute (ISI) - 6th Flr Conf Rm # 689, Marina Del Rey

    Audiences: Everyone Is Invited

    Contact: Peter Zamar

    Event Link: http://nlg.isi.edu/nl-seminar/

    Add to Google CalendarDownload ICS File for OutlookDownload iCal File

Return to Calendar