Logo: University of Southern California

Events Calendar


  • NL Seminar-How to Steal ChatGPTs Embedding Size, and Other Low-rank Logit Tricks

    Thu, Apr 25, 2024 @ 11:00 AM - 12:00 PM

    Information Sciences Institute

    Conferences, Lectures, & Seminars


    Speaker: Matt Finlayson, USC

    Talk Title: How to Steal ChatGPTs Embedding Size, and Other Low-rank Logit Tricks

    Series: NL Seminar

    Abstract: The commercialization of large language models (LLMs) has led to the common practice of restricting access to proprietary models via a limited API. In this work we show that, with only a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing under $1000 USD for OpenAI’s gpt-3.5-turbo). Our findings are centered on one key observation: most modern LLMs suffer from a softmax bottleneck, which restricts the model outputs to a linear subspace of the full output space. We exploit this fact to unlock several capabilities, including (but not limited to) obtaining cheap full-vocabulary outputs, auditing for specific types of model updates, identifying the source LLM given a single full LLM output, and even efficiently discovering the LLM’s hidden size. Our empirical investigations show the effectiveness of our methods, which allow us to estimate the embedding size of OpenAI’s gpt-3.5-turbo to be about 4096. Lastly, we discuss ways that LLM providers can guard against these attacks, as well as how these capabilities can be viewed as a feature (rather than a bug) by allowing for greater transparency and accountability.   *Meeting hosts only admit on-line guests that they know to the Zoom meeting. Hence, you’re highly encouraged to use your USC account to sign into Zoom. If you’re an outside visitor, please inform us at (nlg-seminar-host(at)isi.edu) to make us aware of your attendance so we can admit you. Specify if you will attend remotely or in person at least one business day prior to the event Provide your: full name, job title and professional affiliation and arrive at least 10 minutes before the seminar begins. If you do not have access to the 6th Floor for in-person attendance, please check in at the 10th floor main reception desk to register as a visitor and someone will escort you to the conference room location. 

    Biography: Matthew Finlayson is a PhD student studying NLP at the University of Southern California. Previously he was a predoctoral researcher at the Allen Institute for AI (AI2) after completing his bachelors degree in computer science and linguistics at Harvard University. Matthew is interested in the practical consequences of the architectural design of language models, from security to generation, as well as understanding how language models learn and generalize from data.

    Host: Jon May and Justin Cho

    More Info: https://www.isi.edu/research-groups-nlg/nlg-seminars/

    Webcast: https://www.youtube.com/watch?v=3U9nA-l2YAs

    Location: Information Science Institute (ISI) - Conf Rm#689

    WebCast Link: https://www.youtube.com/watch?v=3U9nA-l2YAs

    Audiences: Everyone Is Invited

    Contact: Pete Zamar

    Event Link: https://www.isi.edu/research-groups-nlg/nlg-seminars/

    OutlookiCal

Return to Calendar