Mon, May 07, 2018 @ 12:00 PM - 02:00 PM
Title: Mutual Information Estimation and Its Applications to Machine Learning
PhD Candidate: Shuyang Gao
Date: May 7
Location: SOS B37
Committee: Aram Galstyan, Greg Verg Steeg, Ilias Diakonikolas, Aiichiro Nakano, Roger Ghanem
Mutual information (MI) has been successfully applied to a wide variety of domains due to its remarkable property to measure dependencies between random variables. Despite its popularity and wide spread usage, a common unavoidable problem of mutual information is its estimation. In this thesis, we demonstrate that a popular class of nonparametric MI estimators based on k-nearest-neighbor graphs requires number of samples that scales exponentially with the true MI. Consequently, accurate estimation of MI between strongly dependent variables is possible only for prohibitively large sample size. This important yet overlooked shortcoming of the existing estimators is due to their implicit reliance on local uniformity of the underlying joint distribution. As a result, my thesis proposes two new estimation strategies to address this issue. The new estimators are robust to local non-uniformity, works well with limited data, and is able to capture relationship strengths over many orders of magnitude than the existing k-nearest-neighbor methods.
Modern data mining and machine learning presents us with problems which may contain thousands of variables and we need to identify only the most promising strong relationships. Therefore, caution must be taken when applying mutual information to such real-world scenarios. By taking these concerns into account, my thesis then demonstrates the practical applicability of mutual information on several tasks, and our contributions include
i) an information-theoretic framework for measuring stylistic coordination in dialogues. The proposed measure has a simple predictive interpretation and can account for various confounding factors through proper conditioning ii) an new algorithm for mutual information-based feature selection in supervised learning setting iii) an information-theoretic framework for learning disentangled and interpretable representations in unsupervised setting using deep neural networks. For the latter two tasks, we propose to use a variational lower bound for efficient estimation and optimization of mutual information. And for the last task, we have also made a substantial connection of the learning objective with variational auto-encoders (VAE).
Location: Social Sciences Building (SOS) - B37
Audiences: Everyone Is Invited
Contact: Lizsl De Leon