-
PhD Dissertation Defense - Mozhdeh Gheini
Fri, Jan 24, 2025 @ 04:00 PM - 05:00 PM
Thomas Lord Department of Computer Science
University Calendar
Title: Inductive Biases for Data- and Parameter-Efficient Transfer Learning
Date and Time: Fri, Jan 24, 2025 @ 10:00 AM - 12:00 PM
Location: Salvatori Computer Science Center (SAL) - 213 and https://usc.zoom.us/j/6564802162
Committee Members: Jonathan May (Chair), Emilio Ferrara, Xuezhe Ma, Khalil Iskarous
Abstract: Data- and resource-intensive pre-training and fine-tuning applied upon Transformer-based models is the dominant paradigm at the forefront of rapid advancements in natural language processing, human language technologies, and most notably, large language models. Such reliance on massive amounts of data, computation, and energy, while effective and impressive from a performance-only perspective, can hinder open, nonexclusive, and sustainable development of these technologies. In this talk, we present how certain inductive biases can be devised to adjust current natural language methods under resource-constrained scenarios and provide insights into why the proposed inductive biases are successful in such cases.
Specifically, we discuss four research directions on data and parameter efficiency of fine-tuning and transfer learning in natural language processing: (1) a universal regimen that creates a single pre-trained checkpoint suitable for machine translation transfer to practically any language pair and eliminates the need for ad hoc pre-training; (2) an architecture-guided parameter-efficient fine-tuning method that performs competitively with full fine-tuning while exclusively updating cross-attention parameters; (3) an analysis of MEGA, a recently introduced augmentation of the Transformer architecture to incorporate explicit recency bias, through the lens of transfer learning; and (4) a meta-learning algorithm to prime pre-trained models for specific fine-tuning strategies.
Combined with ablations that show how they are effective and analyses that demonstrate their generalizability, these directions are meant to serve as tools for resource-efficient transfer learning for natural language processing.Location: Henry Salvatori Computer Science Center (SAL) - 213
Audiences: Everyone Is Invited
Contact: Mozhdeh Gheini