USC - Viterbi School of Engineering

May
03

PhD Defense - Sahil Garg
Fri, May 03, 2019 @ 03:00 PM - 05:00 PM
Thomas Lord Department of Computer Science
University Calendar

PhD Candidate:
Sahil Garg

DateTime: 5/3 from 3pm to 5pm
Location: GFS 213.

Committee:
Aram Galstyan (chair)
Kevin Knight
Greg Ver Steeg
Roger Georges Ghanem
Irina Rish

Dissertation Title: Hashcode Representations of Natural Language for Relation Extraction

This thesis studies the problem of identifying and extracting relationships between biological entities from the text of scientific papers. For the relation extraction task, state-of-the-art performance has been achieved by classification methods based on convolutional kernels which facilitate sophisticated reasoning on natural language text using structural similarities between sentences and/or their parse trees. Despite their success, however, kernel-based methods are difficult to customize and computationally expensive to scale to large datasets. We address the first problem by proposing a nonstationary extension to the conventional convolutional kernels for improved expressiveness and flexibility. For scalability, we propose to employ kernelized locality sensitive hashcodes as explicit representations of natural language structures, which can be used as feature-vector inputs to arbitrary classification methods. We propose a theoretically justified method for optimizing the representations that is based on approximate and efficient maximization of the mutual information between the hashcodes and their class labels. We evaluate the proposed approach on multiple biomedical relation extraction datasets, and observe significant and robust improvements in accuracy over state-of-the-art classifiers, along with drastic orders-of-magnitude speedup compared to conventional kernel methods.
Finally, we introduce a nearly-unsupervised framework for learning kernel- or neural- hashcode representations. We define an information-theoretic objective which leverages both labeled and unlabeled data points for fine-grained optimization of each hash function, and propose a greedy algorithm for maximizing that objective. This novel learning paradigm is beneficial for building hashcode representations generalizing from a training set to a test set. We conduct a thorough experimental evaluation on the relation extraction datasets, and demonstrate that the proposed extension leads to superior accuracies with respect to state-of-the-art supervised and semi-supervised approaches, such as variational autoencoders and adversarial neural networks. An added benefit of the proposed representation learning technique is that it is easily parallelizable, interpretable, and owing to its generality, applicable to a wide range of NLP problems.

Location: Grace Ford Salvatori Hall Of Letters, Arts & Sciences (GFS) - 213
Audiences: Everyone Is Invited

Contact: Lizsl De Leon

This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.
Add to Google Calendar

Return to Calendar

PhD Defense - Sahil Garg