-
CS Colloquium: Le Song (GATECH) - Discriminative Embedding of Latent Variable Models for Structured Data
Tue, Sep 27, 2016 @ 04:00 PM - 05:00 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Speaker: Le Song, GATECH
Talk Title: Discriminative Embedding of Latent Variable Models for Structured Data
Series: Yahoo! Labs Machine Learning Seminar Series
Abstract: This lecture satisfies requirements for CSCI 591: Computer Science Research Colloquium. Part of Yahoo! Labs Machine Learning Seminar Series.
Structured data, such as sequences, trees, graphs and hypergraphs, are prevalent in a number of interdisciplinary areas such as network analysis, knowledge engineering, computational biology, drug design and materials science. The availability of large amount of such structured data has posed great challenges for the machine learning community. How to represent such data to capture their similarities or differences? How to learn predictive models from a large amount of such data, and efficiently? How to learn to generate structured data de novo given certain desired properties?
A common approach to tackle these challenges is to first design a similarity measure, called the kernel function, between two data points, based on either statistics of the substructures or probabilistic generative models; and then a machine learning algorithm will optimize a predictive model based on such similarity measure. However, this elegant two-stage approach has difficulty scaling up, and discriminative information is also not exploited during the design of similarity measure.
In this talk, I will present Structure2Vec, an effective and scalable approach for representing structured data based on the idea of embedding latent variable models into a feature space, and learning such feature space using discriminative information. Interestingly, Structure2Vec extracts features by performing a sequence of nested nonlinear operations in a way similar to graphical model inference procedures, such as mean field and belief propagation. In applications involving genome and protein sequences, drug molecules and energy materials, Structure2Vec consistently produces the-state-of-the-art predictive performance. Furthermore, in the materials property prediction problem involving 2.3 million data points, Structure2Vec is able to produces a more accurate model yet being 10,000 times smaller. In the end, I will also discuss potential improvements over current work, possible extensions to network analysis and computer vision, and thoughts on the structured data design problem.
Biography: Le Song is an assistant professor in the Department of Computational Science and Engineering, College of Computing, Georgia Institute of Technology. He received his Ph.D. in Machine Learning from University of Sydney and NICTA in 2008, and then conducted his post-doctoral research in the Department of Machine Learning, Carnegie Mellon University, between 2008 and 2011. Before he joined Georgia Institute of Technology, he was a research scientist at Google. His principal research direction is machine learning, especially kernel methods and probabilistic graphical models for large scale and complex problems, arising from artificial intelligence, network analysis, computational biology and other interdisciplinary domains. He is the recipient of the AISTATS'16 Best Student Paper Award, IPDPS'15 Best Paper Award, NSF CAREER Award'14, NIPS'13 Outstanding Paper Award, and ICML'10 Best Paper Award. He has also served as the area chair or senior program committee for many leading machine learning and AI conferences such as ICML, NIPS, AISTATS and AAAI, and the action editor for JMLR.
Host: Yan Liu
Location: Henry Salvatori Computer Science Center (SAL) - 101
Audiences: Everyone Is Invited
Contact: Assistant to CS chair