USC - Viterbi School of Engineering

Jan
22

Jinwoo Kim: A Statistical Ontology-Based Approach to Ranking for Multi-Word Search
Tue, Jan 22, 2013 @ 12:00 PM - 02:00 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars

Speaker: Jinwoo Kim, USC Computer Science; Phd

Talk Title: A Statistical Ontology-Based Approach to Ranking for Multi-Word Search

Series: PhD Defense Announcements

Abstract: Title: A STATISTICAL ONTOLOGY-BASED APPROACH TO RANKING FOR MULTI-WORD SEARCH

Candidate: Jinwoo Kim
Department: Computer Science

Date: January 22nd
Time: 12:00pm
Location: SAL 222

Committee:
Dennis McLeod (chair)
Aiichiro Nakano
Larry Pryor

Abstract

Keyword search is a prominent data retrieval method for the Web, largely because the simple and efficient nature of keyword processing allows a large amount of information to be searched with fast response. However, keyword search approaches do not formally capture the clear meaning of a keyword query and fail to address the semantic relationships between keywords. As a result, the accuracy (precision and recall rate) is often unsatisfactory, and the ranking algorithms fail to properly reflect the semantic relevance of keywords.

Our research particularly focuses on increasing the accuracy of search results for multi-word search. We propose a statistical ontology-based semantic ranking algorithm based on sentence units, and a new type of query interface including wildcards. First, we allocate higher-ranking scores to keywords located in the same sentence compared with keywords located in separate sentences. While existing statistical search algorithms such as N-gram only consider sequences of adjacent keywords, our approach is able to calculate sequences of non-adjacent keywords as well as adjacent keywords.
Second, we propose a slightly different type of query interface, which considers a wildcard as an independent unit of a search query - to reflect what users are actually seeking by way of the function of query prediction based on not query data but actual Web data. Unlike current information retrieval approaches such as proximity, statistical language modeling, query prediction and query answering, our statistical ontology-based model synthesizes proximity concept and statistical approaches into a form of ontology. This ontology helps to improve web information retrieval accuracy.

We validated our methodology with a suite of experiments using the Text Retrieval Conference document collection. We focused on two-word queries in our experiments - as two-word queries are quite common. After applying our statistical ontology-based algorithm to the Nutch search engine, we compared the results with results of the original Nutch search and Google Desktop Search. The result demonstrates that our methodology has improved accuracy quite significantly.

Host: Lizsl de Leon

Location: Henry Salvatori Computer Science Center (SAL) - 222
Audiences: Department Only

Contact: Assistant to CS chair

This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.
Add to Google Calendar

Return to Calendar

Events Calendar

Jinwoo Kim: A Statistical Ontology-Based Approach to Ranking for Multi-Word Search