-
AI Seminar-Experiments in Scaling Reinforcement Learning with Verifiable Rewards
Fri, Apr 04, 2025 @ 11:00 AM - 12:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Nathan Lambert, Allen Institure
Talk Title: Experiments in Scaling Reinforcement Learning with Verifiable Rewards
Series: AI Seminar
Abstract: With the release of DeepSeek’s R1 reasoning model, interest in reinforcement learning may be at an all time high. Academics are pouring energy into the space, trying to replicate DeepSeek’s results and establish clear trade-offs and capabilities of this new era of reinforcement learning on language models. This talk discusses these new results with language models trained with Reinforcement Learning with Verifiable Rewards (RLVR), our efforts at scaling them for Ai2’s OLMo and Tülu language models, hints that we may have missed indicating that RL is more effective than people give credit for, and some history from my background in model-based RL/robotics. The goal of the talk is to present a mix of (recent) historical context on language modeling and cutting edge research with RL to forecast how the rapidly expanding industry of language models may change in the near future.
Biography: Nathan Lambert is a Senior Research Scientist and post-training lead at the Allen Institute for AI focusing on building open language models. At the same time he founded and operates Interconnects.ai to increase transparency and understanding of current AI models and systems.
Previously, he helped build an RLHF research team at HuggingFace. He received his PhD from the University of California, Berkeley working at the intersection of machine learning and robotics. He was advised by Professor Kristofer Pister in the Berkeley Autonomous Microsystems Lab and Roberto Calandra at Meta AI Research. He was lucky to intern at Facebook AI and DeepMind during his Ph.D. Nathan was was awarded the UC Berkeley EECS Demetri Angelakos Memorial Achievement Award for Altruism for his efforts to better community norms.
If speaker approves to be recorded for this seminar it will be posted on the USC/ISI YouTube page within 1-2 business days: https://www.youtube.com/user/USCISI.
Subscribe here to learn more about upcoming seminars: https://www.isi.edu/events/ .
Host: Eric Boxer and Justina Gilleland
More Info: https://www.isi.edu/events/5553/experiments-in-scaling-reinforcement-learning-with-verifiable-rewards/
Webcast: https://usc.zoom.us/j/94409584905?pwd=Sm5LVkd0bndUdEluM3piK0NWTUQrUT09Location: Information Science Institute (ISI) - Virtual Only
WebCast Link: https://usc.zoom.us/j/94409584905?pwd=Sm5LVkd0bndUdEluM3piK0NWTUQrUT09
Audiences: Everyone Is Invited
Contact: Pete Zamar
Event Link: https://www.isi.edu/events/5553/experiments-in-scaling-reinforcement-learning-with-verifiable-rewards/
This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.