Logo: University of Southern California

Events Calendar


  • AI Seminar-Fast algorithms for nearest neighbor classification

    Fri, Jul 25, 2014 @ 11:00 AM - 12:00 PM

    Information Sciences Institute

    Conferences, Lectures, & Seminars


    Speaker: Sanjoy Dasgupta, UCSD

    Talk Title: Fast Algorithms for Nearest Neighbor Classification

    Series: Artificial Intelligence Seminar

    Abstract: Nearest neighbor (NN) search is one of the simplest and most enduring methods of statistical estimation. We examine its algorithmic complexity via two results.
    1. Randomized tree structures for fast NN search
    The k-d tree was one of the first spatial data structures proposed for NN search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in several situations of interest: when the data are drawn from a doubling measure; when the data and query distributions are identical and are supported on a set of bounded doubling dimension; and when the data are documents from a topic model.
    2. Data structures that adapt to a query distribution
    Can we leverage learning techniques to build a fast NN retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this framework through two popular NN data structures: k-d trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures.
    This is joint work with Lawrence Cayton, Eugene Che, Kaushik Sinha, and Zhen Zhai.

    Biography: Sanjoy Dasgupta is a Professor in the Department of Computer Science and Engineering at UC San Diego. He received his PhD from Berkeley in 2000, and spent two years at AT&T Research Labs before joining UCSD. His area of research is algorithmic statistics, with a focus on unsupervised and minimally supervised learning. He is the author of a textbook, "Algorithms" (with Christos Papadimitriou and Umesh Vazirani), that appeared in 2006.

    Home Page:
    http://cseweb.ucsd.edu/users/dasgupta/

    Host: Greg Ver Steeg

    Webcast: http://webcasterms1.isi.edu/mediasite/Viewer/?peid=84af215b6be04e48b5f164352b9f20e31d

    Location: 11th Flr Conf Rm # 1135, Marina Del Rey

    WebCast Link: http://webcasterms1.isi.edu/mediasite/Viewer/?peid=84af215b6be04e48b5f164352b9f20e31d

    Audiences: Everyone Is Invited

    Contact: Peter Zamar

    Add to Google CalendarDownload ICS File for OutlookDownload iCal File

Return to Calendar