-
CS Colloq: Kamalika Chaudhuri
Thu, Mar 11, 2010 @ 03:30 PM - 05:00 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Talk Title: Statistical Algorithms for Modern Datasets
Speaker: Kamalika Chaudhuri
Host: Prof. Gaurav SukhatmeAbstract:
In this talk, we address two issues that arise in learning in modern datasets. First, with the increase in electronic record-keeping, many datasets that learning algorithms work with relate to sensitive information about individuals. Thus the problem of privacy-preserving learning -- how to design learning algorithms that operate on the sensitive data of individuals while still guaranteeing the privacy of individuals in the training set -- has achieved great practical importance. In this talk, we address the problem of privacy-preserving classification, and we present an efficient classifier which is private in the differential privacy model of Dwork et al. Our classifier works in the ERM (empirical lossminimization) framework, and includes privacy preserving logistic regression and privacy preserving support vector machines. We show that our classifier is private, provide analytical bounds on the sample requirement of our classifier, and evaluate it on some real data. A second characteristic of modern datasets is that data is often available from multiple domains or views. For example, when clustering a document corpus such as Wikipedia, we have access to the contents of the documents and their link structure. In this talk, we address this problem of Multiview Clustering -- how to use information from multiple views to improve clustering performance. We present an algorithm for multiview clustering, provide analytical bounds on the performance of our algorithm under certain statistical assumptions, and finally evaluate our algorithm on some real data.Based on joint work with Sham Kakade (UPenn), Karen Livescu (TTI Chicago), Claire Monteleoni (CCLS Columbia), Anand Sarwate (ITA UCSD), and Karthik Sridharan (TTI Chicago).Bio:
Kamalika Chaudhuri received a Bachelor of Technology degree in Computer Science and Engineering in 2002 from Indian Institute of Technology, Kanpur, and a PhD in Computer Science from UC Berkeley in 2007. She is currently a postdoctoral researcher at the Computer Science and Engineering Department at UCSD. Kamalika's research is on the design and analysis of machine-learning algorithms and their applications. In particular, her interests lie in -- clustering, online learning, and privacy-preserving machine-learning, and the applications of machine-learning and algorithms to practical problems in other areas.
Location: Seaver Science Library (SSL) - 150
Audiences: Everyone Is Invited
Contact: CS Front Desk