  • NL Seminar - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

    Thu, Mar 07, 2024 @ 11:00 AM - 12:00 PM

    Information Sciences Institute

    Speaker: Zixiang Chen, UCLA

    Talk Title: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

    Series: NL Seminar

    Abstract: Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this talk, I will introduce our newest fine-tuning method, Self-Play Fine-Tuning (SPIN), which improves LLMs without the need for additional human-annotated data. SPIN utilizes a self-play mechanism, where the LLM enhances its capabilities by generating its own training data through interactions with instances of itself. Specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. As a result, SPIN unlocks the full potential of human-annotated data for SFT. Our empirical results show that SPIN can improve the LLM's performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. Additionally, I will outline the theoretical guarantees of our method. For more details and access to our codes, visit our GitHub repository (https://github.com/uclaml/SPIN).

    Biography: Zixiang Chen is currently a Ph.D. student in computer science at the Department of Computer Science, University of California, Los Angeles (UCLA), advised by Prof. Quanquan Gu. He obtained his bachelor's degree in mathematics from Tsinghua University. He is broadly interested in the theory and applications of deep learning, optimization, and control, with a focus on generative models, representation learning, and multi-agent reinforcement learning. Recently, he has been utilizing AI to enhance scientific discovery in the domain of public health. He was a visiting graduate student in the theory of reinforcement learning program at the Simons Institute for the Theory of Computing.

    Host: Jon May and Justin Cho

    More Info: https://nlg.isi.edu/nl-seminar/

    Webcast: https://youtu.be/Fg4C6YZcqQ4

    Location: Information Science Institute (ISI) - Virtual and ISI-Conf Rm#689

    WebCast Link: https://youtu.be/Fg4C6YZcqQ4

    Event Link: https://nlg.isi.edu/nl-seminar/


