Logo: University of Southern California

Events Calendar


  • Jinwoo Kim: A Statistical Ontology-Based Approach to Ranking for Multi-Word Search

    Tue, Jan 22, 2013 @ 12:00 PM - 02:00 PM

    Thomas Lord Department of Computer Science

    Conferences, Lectures, & Seminars


    Speaker: Jinwoo Kim, USC Computer Science; Phd

    Talk Title: A Statistical Ontology-Based Approach to Ranking for Multi-Word Search

    Series: PhD Defense Announcements

    Abstract: Title: A STATISTICAL ONTOLOGY-BASED APPROACH TO RANKING FOR MULTI-WORD SEARCH

    Candidate: Jinwoo Kim
    Department: Computer Science

    Date: January 22nd
    Time: 12:00pm
    Location: SAL 222

    Committee:
    Dennis McLeod (chair)
    Aiichiro Nakano
    Larry Pryor

    Abstract

    Keyword search is a prominent data retrieval method for the Web, largely because the simple and efficient nature of keyword processing allows a large amount of information to be searched with fast response. However, keyword search approaches do not formally capture the clear meaning of a keyword query and fail to address the semantic relationships between keywords. As a result, the accuracy (precision and recall rate) is often unsatisfactory, and the ranking algorithms fail to properly reflect the semantic relevance of keywords.

    Our research particularly focuses on increasing the accuracy of search results for multi-word search. We propose a statistical ontology-based semantic ranking algorithm based on sentence units, and a new type of query interface including wildcards. First, we allocate higher-ranking scores to keywords located in the same sentence compared with keywords located in separate sentences. While existing statistical search algorithms such as N-gram only consider sequences of adjacent keywords, our approach is able to calculate sequences of non-adjacent keywords as well as adjacent keywords.
    Second, we propose a slightly different type of query interface, which considers a wildcard as an independent unit of a search query - to reflect what users are actually seeking by way of the function of query prediction based on not query data but actual Web data. Unlike current information retrieval approaches such as proximity, statistical language modeling, query prediction and query answering, our statistical ontology-based model synthesizes proximity concept and statistical approaches into a form of ontology. This ontology helps to improve web information retrieval accuracy.

    We validated our methodology with a suite of experiments using the Text Retrieval Conference document collection. We focused on two-word queries in our experiments - as two-word queries are quite common. After applying our statistical ontology-based algorithm to the Nutch search engine, we compared the results with results of the original Nutch search and Google Desktop Search. The result demonstrates that our methodology has improved accuracy quite significantly.

    Host: Lizsl de Leon

    Location: Henry Salvatori Computer Science Center (SAL) - 222

    Audiences: Department Only

    Contact: Assistant to CS chair

    Add to Google CalendarDownload ICS File for OutlookDownload iCal File

Return to Calendar