USC - Viterbi School of Engineering

Nov
09

NL Seminar - Manipulating Large Language Model Predictions Through Data
Thu, Nov 09, 2023 @ 11:00 AM - 12:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars

Speaker: Alexander Wan, University of Cal-Berkeley

Talk Title: Manipulating Large Language Model Predictions Through Data

Series: NL Seminar

Abstract: This talk will be a live presentation only, it will not be recorded.
REMINDER: Meeting hosts only admit guests that they know to the Zoom meeting. Hence, you are highly encouraged to use your USC account to sign into Zoom.
If you’re an outside visitor, please provide your: Full Name, Title and Name of Workplace to (nlg-seminar-host(at)isi.edu) beforehand so we’ll be aware of your attendance. Also, let us know if you plan to attend in-person or virtually.
More Info on NL Seminars can be found at: https://nlg.isi.edu/nl-seminar/
Large language models use large amounts of unmoderated data at each stage of the training and deployment pipeline. In this talk, I will show how these lax requirements enable adversaries to manipulate both training and test data, allowing a myriad of possible attacks. First, during training time, I will show that adversaries can modify instruction-tuning datasets to systematically manipulate predictions across a range of tasks or induce degenerate outputs across hundreds of arbitrary tasks, using as few as 100 poison examples. At inference time, additional data is often used in retrieval- or tool-augmented models. Naturally, these models will face information from a wide variety of sources that have varying degrees of quality. Humans are also faced with this same range of sources but can make judgements of trustworthiness based on factors like the style of argumentation or the recency of information. We show that not only do model predictions differ significantly from human credibility judgements, but also that gaps in this judgement creates opportunities for adversaries to manipulate answers to user queries.

Biography: Alexander Wan is a third-year undergraduate at UC Berkeley majoring in Computer Science, Statistics, and Mathematics. He works closely with folks at the Berkeley NLP Group and the MSU Heterogeneous Learning and Reasoning lab, with a focus on improving the robustness and interpretability of large language models. He's also more broadly interested in the intersection of machine learning and cognitive science: using current ML models to better understand human cognition and building more robust models through cognitively inspired architectures and training.

Host: Jon May and Justin Cho

More Info: https://nlg.isi.edu/nl-seminar/

Webcast: https://usc.zoom.us/j/95174101995
Location: Information Science Institute (ISI) - Virtual and ISI-Conf Rm#689
WebCast Link: https://usc.zoom.us/j/95174101995
Audiences: Everyone Is Invited

Contact: Pete Zamar

Event Link: https://nlg.isi.edu/nl-seminar/

This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.
Add to Google Calendar

Return to Calendar

Events Calendar

NL Seminar - Manipulating Large Language Model Predictions Through Data