Logo: University of Southern California

Events Calendar


  • PhD Defense - Sahil Garg

    Fri, May 03, 2019 @ 03:00 PM - 05:00 PM

    Thomas Lord Department of Computer Science

    University Calendar


    PhD Candidate:
    Sahil Garg

    DateTime: 5/3 from 3pm to 5pm
    Location: GFS 213.

    Committee:
    Aram Galstyan (chair)
    Kevin Knight
    Greg Ver Steeg
    Roger Georges Ghanem
    Irina Rish

    Dissertation Title: Hashcode Representations of Natural Language for Relation Extraction

    This thesis studies the problem of identifying and extracting relationships between biological entities from the text of scientific papers. For the relation extraction task, state-of-the-art performance has been achieved by classification methods based on convolutional kernels which facilitate sophisticated reasoning on natural language text using structural similarities between sentences and/or their parse trees. Despite their success, however, kernel-based methods are difficult to customize and computationally expensive to scale to large datasets. We address the first problem by proposing a nonstationary extension to the conventional convolutional kernels for improved expressiveness and flexibility. For scalability, we propose to employ kernelized locality sensitive hashcodes as explicit representations of natural language structures, which can be used as feature-vector inputs to arbitrary classification methods. We propose a theoretically justified method for optimizing the representations that is based on approximate and efficient maximization of the mutual information between the hashcodes and their class labels. We evaluate the proposed approach on multiple biomedical relation extraction datasets, and observe significant and robust improvements in accuracy over state-of-the-art classifiers, along with drastic orders-of-magnitude speedup compared to conventional kernel methods.
    Finally, we introduce a nearly-unsupervised framework for learning kernel- or neural- hashcode representations. We define an information-theoretic objective which leverages both labeled and unlabeled data points for fine-grained optimization of each hash function, and propose a greedy algorithm for maximizing that objective. This novel learning paradigm is beneficial for building hashcode representations generalizing from a training set to a test set. We conduct a thorough experimental evaluation on the relation extraction datasets, and demonstrate that the proposed extension leads to superior accuracies with respect to state-of-the-art supervised and semi-supervised approaches, such as variational autoencoders and adversarial neural networks. An added benefit of the proposed representation learning technique is that it is easily parallelizable, interpretable, and owing to its generality, applicable to a wide range of NLP problems.

    Location: Grace Ford Salvatori Hall Of Letters, Arts & Sciences (GFS) - 213

    Audiences: Everyone Is Invited

    Contact: Lizsl De Leon

    Add to Google CalendarDownload ICS File for OutlookDownload iCal File

Return to Calendar