-
NL Seminar-Do Androids Know They're Only Dreaming of Electric Sheep?
Mon, Mar 18, 2024 @ 11:00 AM - 12:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Sky Wang, Columbia University
Talk Title: Do Androids Know They're Only Dreaming of Electric Sheep?
Series: NL Seminar
Abstract: REMINDER: This talk will be a live presentation only, it will not be recorded. Meeting hosts only admit guests that they know to the Zoom meeting. Hence, you’re highly encouraged to use your USC account to sign into Zoom. If you’re an outside visitor, please provide your: Full Name, Title and Name of Workplace to (nlg-seminar-host(at)isi.edu) beforehand so we’ll be aware of your attendance. Also, let us know if you plan to attend in-person or virtually. More Info for NL Seminars can be found at: https://nlg.isi.edu/nl-seminar/ We design probes trained on the internal representations of a transformer language model that are predictive of its hallucinatory behavior on in-context generation tasks. To facilitate this detection, we create a span-annotated dataset of organic and synthetic hallucinations over several tasks. We find that probes trained on the force-decoded states of synthetic hallucinations are generally ecologically invalid in organic hallucination detection. Furthermore, hidden state information about hallucination appears to be task and distribution-dependent. Intrinsic and extrinsic hallucination saliency varies across layers, hidden state types, and tasks; notably, extrinsic hallucinations tend to be more salient in a transformer's internal representations. Outperforming multiple contemporary baselines, we show that probing is a feasible and efficient alternative to language model hallucination evaluation when model states are available.
Biography: If speaker approves to be recorded for this NL Seminar talk, it will be posted on our USC/ISI YouTube page within 1-2 business days: https://www.youtube.com/user/USCISI. Subscribe here to learn more about upcoming seminars: https://www.isi.edu/events/ Sky is a Ph.D. candidate in Computer Science at Columbia University advised by Zhou Yu and Smaranda Muresan. His research primarily revolves around Natural Language Processing (NLP), with broad interests in the area where NLP meets Computational Social Science (CSS). Here, his research primarily revolves around three major areas: (1) revealing and designing for social difference and inequality, (2) cross-cultural NLP, and (3) mechanistic interpretability. His research is supported by a NSF Graduate Research Fellowship and has received two outstanding paper awards at EMNLP. He has previously been an intern at Microsoft Semantic Machines, Google Research, and Amazon AWS AI.
Host: Jon May and Justin Cho
More Info: https://nlg.isi.edu/nl-seminar/
Webcast: https://www.youtube.com/watch?v=Pm0ljFMg0cwLocation: Information Science Institute (ISI) - Virtual and ISI-Conf Rm#689
WebCast Link: https://www.youtube.com/watch?v=Pm0ljFMg0cw
Audiences: Everyone Is Invited
Contact: Pete Zamar
Event Link: https://nlg.isi.edu/nl-seminar/