-
Seminar will be exclusively online (no in-room presentation) - CS Colloquium: Baharan Mirzasoleiman (Stanford University) - Efficient Machine Learning via Data Summarization
Tue, Mar 31, 2020 @ 04:00 PM - 05:00 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Speaker: Baharan Mirzasoleiman, Stanford University
Talk Title: Efficient Machine Learning via Data Summarization
Series: CS Colloquium
Abstract: Large datasets have been crucial to the success of modern machine learning models. However, training on massive data has two major limitations. First, it is contingent on exceptionally large and expensive computational resources, and incurs a substantial cost due to the significant energy consumption.
Second, in many real-world applications such as medical diagnosis and self-driving cars, big data contains highly imbalanced classes and noisy labels. In such cases, training on the entire data does not result in a high-quality model. In this talk, I will argue that we can address the above limitations by developing techniques that can identify and extract the representative subsets from massive datasets. Training on representative subsets not only reduces the substantial costs of learning from big data, but also improves their accuracy and robustness against noisy labels. I will present two key aspects to achieve this goal: (1) extracting the representative data points by summarizing massive datasets; and (2) developing efficient optimization methods to learn from the extracted summaries. I will discuss how we can develop theoretically rigorous techniques that provide strong guarantees for the quality of the extracted summaries, and the learned models' quality and robustness against noisy labels. I will also show the applications of these techniques to several problems, including summarizing massive image collections, online video summarization, and speeding up training machine learning models.
This lecture satisfies requirements for CSCI 591: Research Colloquium
Biography: Baharan Mirzasoleiman is a Postdoctoral Research Scholar in Computer Science Department at Stanford University, where she works with Prof. Jure Leskovec. Baharan's research focuses on developing new methods that enable efficient exploration and learning from massive datasets. She received her PhD from ETH Zurich, working with Prof. Andreas Krause. She has also spent two summers as an intern at Google Research. She was awarded an ETH medal for Outstanding Doctoral Dissertation, and a Google Anita Borg Memorial Scholarship. She was also selected as a Rising Star in EECS from MIT.
Host: Bistra Dilkina
Location: Seminar will be exclusively online (no in-room presentation)
Audiences: Everyone Is Invited
Contact: Assistant to CS chair