BEGIN:VCALENDAR METHOD:PUBLISH PRODID:-//Apple Computer\, Inc//iCal 1.0//EN X-WR-CALNAME;VALUE=TEXT:USC VERSION:2.0 BEGIN:VEVENT DESCRIPTION:PhD Candidate:\n Sahil Garg\n \n DateTime: 5/3 from 3pm to 5pm\n Location: GFS 213.\n \n Committee:\n Aram Galstyan (chair)\n Kevin Knight\n Greg Ver Steeg\n Roger Georges Ghanem\n Irina Rish\n \n Dissertation Title: Hashcode Representations of Natural Language for Relation Extraction\n \n This thesis studies the problem of identifying and extracting relationships between biological entities from the text of scientific papers. For the relation extraction task, state-of-the-art performance has been achieved by classification methods based on convolutional kernels which facilitate sophisticated reasoning on natural language text using structural similarities between sentences and/or their parse trees. Despite their success, however, kernel-based methods are difficult to customize and computationally expensive to scale to large datasets. We address the first problem by proposing a nonstationary extension to the conventional convolutional kernels for improved expressiveness and flexibility. For scalability, we propose to employ kernelized locality sensitive hashcodes as explicit representations of natural language structures, which can be used as feature-vector inputs to arbitrary classification methods. We propose a theoretically justified method for optimizing the representations that is based on approximate and efficient maximization of the mutual information between the hashcodes and their class labels. We evaluate the proposed approach on multiple biomedical relation extraction datasets, and observe significant and robust improvements in accuracy over state-of-the-art classifiers, along with drastic orders-of-magnitude speedup compared to conventional kernel methods.\n Finally, we introduce a nearly-unsupervised framework for learning kernel- or neural- hashcode representations. We define an information-theoretic objective which leverages both labeled and unlabeled data points for fine-grained optimization of each hash function, and propose a greedy algorithm for maximizing that objective. This novel learning paradigm is beneficial for building hashcode representations generalizing from a training set to a test set. We conduct a thorough experimental evaluation on the relation extraction datasets, and demonstrate that the proposed extension leads to superior accuracies with respect to state-of-the-art supervised and semi-supervised approaches, such as variational autoencoders and adversarial neural networks. An added benefit of the proposed representation learning technique is that it is easily parallelizable, interpretable, and owing to its generality, applicable to a wide range of NLP problems.\n \n SEQUENCE:5 DTSTART:20190503T150000 LOCATION:GFS 213 DTSTAMP:20190503T150000 SUMMARY:PhD Defense - Sahil Garg UID:EC9439B1-FF65-11D6-9973-003065F99D04 DTEND:20190503T170000 END:VEVENT END:VCALENDAR