-
Jinwoo Kim: A Statistical Ontology-Based Approach to Ranking for Multi-Word Search
Tue, Jan 22, 2013 @ 12:00 PM - 02:00 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Speaker: Jinwoo Kim, USC Computer Science; Phd
Talk Title: A Statistical Ontology-Based Approach to Ranking for Multi-Word Search
Series: PhD Defense Announcements
Abstract: Title: A STATISTICAL ONTOLOGY-BASED APPROACH TO RANKING FOR MULTI-WORD SEARCH
Candidate: Jinwoo Kim
Department: Computer Science
Date: January 22nd
Time: 12:00pm
Location: SAL 222
Committee:
Dennis McLeod (chair)
Aiichiro Nakano
Larry Pryor
Abstract
Keyword search is a prominent data retrieval method for the Web, largely because the simple and efficient nature of keyword processing allows a large amount of information to be searched with fast response. However, keyword search approaches do not formally capture the clear meaning of a keyword query and fail to address the semantic relationships between keywords. As a result, the accuracy (precision and recall rate) is often unsatisfactory, and the ranking algorithms fail to properly reflect the semantic relevance of keywords.
Our research particularly focuses on increasing the accuracy of search results for multi-word search. We propose a statistical ontology-based semantic ranking algorithm based on sentence units, and a new type of query interface including wildcards. First, we allocate higher-ranking scores to keywords located in the same sentence compared with keywords located in separate sentences. While existing statistical search algorithms such as N-gram only consider sequences of adjacent keywords, our approach is able to calculate sequences of non-adjacent keywords as well as adjacent keywords.
Second, we propose a slightly different type of query interface, which considers a wildcard as an independent unit of a search query - to reflect what users are actually seeking by way of the function of query prediction based on not query data but actual Web data. Unlike current information retrieval approaches such as proximity, statistical language modeling, query prediction and query answering, our statistical ontology-based model synthesizes proximity concept and statistical approaches into a form of ontology. This ontology helps to improve web information retrieval accuracy.
We validated our methodology with a suite of experiments using the Text Retrieval Conference document collection. We focused on two-word queries in our experiments - as two-word queries are quite common. After applying our statistical ontology-based algorithm to the Nutch search engine, we compared the results with results of the original Nutch search and Google Desktop Search. The result demonstrates that our methodology has improved accuracy quite significantly.
Host: Lizsl de Leon
Location: Henry Salvatori Computer Science Center (SAL) - 222
Audiences: Department Only
Contact: Assistant to CS chair