Logo: University of Southern California

Events Calendar

  • NL Seminar - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

    Thu, Mar 07, 2024 @ 11:00 AM - 12:00 PM

    Information Sciences Institute

    Conferences, Lectures, & Seminars

    Speaker: Zixiang Chen, UCLA

    Talk Title: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

    Series: NL Seminar

    Abstract: REMINDER: This talk will be a live presentation only, it will not be recorded.  Meeting hosts only admit guests that they know to the Zoom meeting. Hence, you’re highly encouraged to use your USC account to sign into Zoom. If you’re an outside visitor, please provide your: Full Name, Title and Name of Workplace to (nlg-seminar-host(at)isi.edu) beforehand so we’ll be aware of your attendance. Also, let us know if you plan to attend in-person or virtually. More Info for NL Seminars can be found at: https://nlg.isi.edu/nl-seminar/. Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this talk, I will introduce our newest fine-tuning method, Self-Play Fine-Tuning (SPIN), which improves LLMs without the need for additional human-annotated data. SPIN utilizes a self-play mechanism, where the LLM enhances its capabilities by generating its own training data through interactions with instances of itself. Specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. As a result, SPIN unlocks the full potential of human-annotated data for SFT. Our empirical results show that SPIN can improve the LLM’s performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. Additionally, I will outline the theoretical guarantees of our method. For more details and access to our codes, visit our GitHub repository (https://github.com/uclaml/SPIN).

    Biography: Zixiang Chen is currently a Ph.D. student in computer science at the Department of Computer Science, University of California, Los Angeles (UCLA), advised by Prof. Quanquan Gu. He obtained his bachelor’s degree in mathematics from Tsinghua University. He is broadly interested in the theory and applications of deep learning, optimization, and control, with a focus on generative models, representation learning, and multi-agent reinforcement learning. Recently, he has been utilizing AI to enhance scientific discovery in the domain of public health. He was a visiting graduate student in the theory of reinforcement learning program at the Simons Institute for the Theory of Computing. If speaker approves to be recorded for this NL Seminar talk, it will be posted on our USC/ISI YouTube page within 1-2 business days: https://www.youtube.com/user/USCISI. Subscribe here to learn more about upcoming seminars: https://www.isi.edu/events/

    Host: Jon May and Justin Cho

    More Info: https://nlg.isi.edu/nl-seminar/

    Webcast: https://youtu.be/Fg4C6YZcqQ4

    Location: Information Science Institute (ISI) - Virtual and ISI-Conf Rm#689

    WebCast Link: https://youtu.be/Fg4C6YZcqQ4

    Audiences: Everyone Is Invited

    Contact: Pete Zamar

    Event Link: https://nlg.isi.edu/nl-seminar/


Return to Calendar