-
PhD Dissertation Defense - Xisen Jin
Thu, Jun 12, 2025 @ 01:00 PM - 03:00 PM
Thomas Lord Department of Computer Science
University Calendar
Title: Towards Continual Learning of Language Models in the Wild
Date and Time: Thursday, June 12th, 2025 | 1:00p - 3:00p
Location: RTH 306
Committee Members: Xiang Ren (Chair), Jesse Thomason, Mahdi Soltanolkotabi
Abstract: Language language models (LLMs/LMs) have become foundations of many artificial intelligence (AI) applications, and greatly benefited users in seeking information and completing tasks. Alongside the success of LLMs, there is an increasing need to promptly update these models for new application domains, new factual knowledge, and mitigating harmful behaviors. The large-scale models and the complicated data distributions have introduced unforeseen challenges in earlier study of continual learning; at the same time, new paradigms of building models, e.g., fine-tuning open-source models, have become prevalent. These new challenges and resources create a context of continual learning of language models, which we term continual learning in the wild, that differentiates the problem from the past study.
The thesis focuses on identifying and addressing the emerging challenges in continual learning of language models. In the first part of the thesis, I propose training and evaluation protocols representative of two different goals of continual learning. I create two datasets, namely a domain-incremental research paper stream and a chronologically-ordered tweet stream, alongside downstream datasets to test model capability. In addition, I extensively evaluate existing or new continual learning algorithms for the setup and identify that knowledge distillation from past model checkpoints stands out as an effective continual learning algorithm.
In the second part of the thesis, I propose to study how merging weights of existing models can achieve the goal of fusing knowledge of multiple models without access to original training data. I propose a novel model merging algorithm, RegMean, which is simple to implement, computationally efficient, and outperforms baseline merging algorithms significantly.
In the remaining part of the thesis, I introduce my work on analyzing patterns of upstream knowledge forgetting in continual learning. I interpret significant patterns of forgetting in upstream data that arise when fine-tuning LLMs. The analysis demonstrates that accurate predictions about forgetting can be made using embedding similarity models, or matrix completion from a small set of observed occurrences of forgetting. I further illustrate how predicting forgetting can lead to the development of simple and effective continual learning algorithms.Location: Ronald Tutor Hall of Engineering (RTH) - 306
Audiences: Everyone Is Invited
Contact: Xisen Jin
This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.