Thu, Jan 13, 2011 @ 03:30 PM - 05:00 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Speaker: Dr. Zornitsa Kozareva, ISI, USC: Natural Language (Lexical Semantics)
Talk Title: Learning the Encyclopedia of the World using the Web
Abstract: How can we automatically build the Encyclopedia of the World, that will contain not only high-level information such as found in Wikipedia, but also particular facts such as \"Who appeared in a concert in the Hollywood Bowl last night?\" ?This is a challenging problem, which was never solved despite many have worked on it. In this talk, I will present novel algorithms for information gathering, sifting and organization that can rapidly, accurately and completely cover any area of interest mining unstructured text on the Web. I will describe a semi-supervised bootstrapping procedure, which uses a recursive lexico-syntactic pattern and an instance of a given semantic relation to scan billions of Web documents, and automatically harvest and taxonomize thousands to millions of new instances, facts and semantic relations. I will describe graph-based algorithms used to validate and rank the harvested knowledge. Finally, I will show that the algorithms (1) outperform state-of-the-art systems like KnowItAll and Yago, (2) enrich existing human-built knowledge repositories like WordNet, and (3) accurately reconstruct taxonomies starting from scratch. The developed search technology has shown that it is possible to begin the building of the Encyclopedia of the World and has opened up new directions for research.
Biography: Zornitsa Kozareva is a Research Scientist in the Natural Language group at the Information Sciences Institute, University of Southern California (USC/ISI). She received her PhD with Cum Laude from the University of Alicante, Spain. Her research interests lie in Web-based knowledge acquisition, text mining, lexical semantics, ontology population and multilingual information extraction. In 2010, Zornitsa co-organized one of the biggest challenges in the area of semantics called SemEval. She co-organized the CCIACADA/VACCINE Reconnect Conference. She was the leader of the team that won the answer validation challenge (AVE-2006) for French and Italian, and a member of the team that won the Spanish Geographic Information Retrieval (GeoClef-2006) challenge.
Host: Prof. Aiichiro Nakano
Location: SSL 150
Audiences: Everyone Is Invited
Contact: Kanak Agrawal