-
PhD Dissertation Defense - Binh Vu
Fri, May 17, 2024 @ 03:00 PM - 05:00 PM
Thomas Lord Department of Computer Science
University Calendar
Title: Exploiting Web Tables and Knowledge Graphs for Creating Semantic Descriptions of Data Sources
Committee: Craig Knoblock (Chair), Sven Koenig, Daniel Edmund O'Leary, Yolanda Gil, Jay Pujara
Date and Time: Friday, May 17th - 3:00p - 5:00p
Location: SAL 322
Abstract: There is an enormous number of tables available on the web, and they can provide valuable information for diverse applications. To harvest information from the tables, we need precise mappings, called semantic descriptions, of concepts and relationships in the data to classes and properties in a target ontology. However, creating semantic descriptions, or semantic modeling, is a complex task requiring considerable manual effort and expertise. Much research has focused on automating this problem. However, existing supervised and unsupervised approaches both face various difficulties. The supervised approaches require lots of known semantic descriptions for training and, thus, are hard to apply to a new or large domain ontology. On the other hand, the unsupervised approaches exploit the overlapping data between tables and knowledge graphs; hence, they perform poorly on tables with lots of ambiguity or little overlapping data. To address the aforementioned weaknesses, we present novel approaches for two main cases: tables that have overlapping data with a knowledge graph (KG) and tables that do not have overlapping data. Exploiting web tables that have links to entities in a KG, we automatically create a labeled dataset to learn to combine table data, metadata, and overlapping background knowledge (if available) to find accurate semantic descriptions. Our methods for the two cases together provide a comprehensive solution to the semantic modeling problem. In the evaluation, our approach in the overlapping setting yields an improvement of approximately 5\% in F$_1$ scores compared to the state-of-the-art methods. In the non-overlapping setting, our approach outperforms strong baselines by 10\% to 30\% in F$_1$ scores.Location: Henry Salvatori Computer Science Center (SAL) - 322
Audiences: Everyone Is Invited
Contact: Felante' Charlemagne