-
AI Seminar-Fast algorithms for nearest neighbor classification
Fri, Jul 25, 2014 @ 11:00 AM - 12:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Sanjoy Dasgupta, UCSD
Talk Title: Fast Algorithms for Nearest Neighbor Classification
Series: Artificial Intelligence Seminar
Abstract: Nearest neighbor (NN) search is one of the simplest and most enduring methods of statistical estimation. We examine its algorithmic complexity via two results.
1. Randomized tree structures for fast NN search
The k-d tree was one of the first spatial data structures proposed for NN search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in several situations of interest: when the data are drawn from a doubling measure; when the data and query distributions are identical and are supported on a set of bounded doubling dimension; and when the data are documents from a topic model.
2. Data structures that adapt to a query distribution
Can we leverage learning techniques to build a fast NN retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this framework through two popular NN data structures: k-d trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures.
This is joint work with Lawrence Cayton, Eugene Che, Kaushik Sinha, and Zhen Zhai.
Biography: Sanjoy Dasgupta is a Professor in the Department of Computer Science and Engineering at UC San Diego. He received his PhD from Berkeley in 2000, and spent two years at AT&T Research Labs before joining UCSD. His area of research is algorithmic statistics, with a focus on unsupervised and minimally supervised learning. He is the author of a textbook, "Algorithms" (with Christos Papadimitriou and Umesh Vazirani), that appeared in 2006.
Home Page:
http://cseweb.ucsd.edu/users/dasgupta/
Host: Greg Ver Steeg
Webcast: http://webcasterms1.isi.edu/mediasite/Viewer/?peid=84af215b6be04e48b5f164352b9f20e31dLocation: 11th Flr Conf Rm # 1135, Marina Del Rey
WebCast Link: http://webcasterms1.isi.edu/mediasite/Viewer/?peid=84af215b6be04e48b5f164352b9f20e31d
Audiences: Everyone Is Invited
Contact: Peter Zamar