Thu, Aug 04, 2022 @ 11:00 AM - 12:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: 1.) Taiwei Shi and 2.) Jonne Saleva, USC/ISI Interns
Talk Title: Title 1.)Improving Moderation of Online Discussions via Nonviolent Communication 2.)Linguistic heritage-aware language model adaptation for diasporic languages
Abstract: Only the first segment of this seminar will be recorded, the second portion will be Live Only.
Meeting hosts only admit guests that they know to the Zoom meeting. Hence, you are highly encouraged to use your USC account to sign into Zoom.
If you are an outside visitor, please inform us at (nlg DASH seminar DASH host AT isi DOR edu beforehand so we will be aware of your attendance and let you in.
In-person attendance will be permitted for USC ISI faculty, staff, students only. Open to the public virtually via the zoom registration link and online.
1.Abstract for Taiwei Shi:
The growing number of comments makes online discussions problematic to moderate by human moderators only. A crucial limitation of current automated moderation is that the generations are repetitive, generic, and judgmental, which is not effective in terms of changing someones mind and behaviors. We seek to build dialogue models that can intervene in an adversarial conversation involving participants that have abandoned reasoned discussion and descended into personal attacks. While also a difficult problem among humans, we would like to explore the effectiveness of Nonviolent Communication NVC, an approach to restoring breakdowns in communication.
In this talk, we will discuss the strategies of incorporating one aspect of NVC called observation without evaluation O vs E into dialogue models. First, we obtain a sufficiently large set of O vs E dialogue data to train an O vs E classifier. We then expand this to a sufficiently large set to fine tune a dialogue model. We also explore text style transfer to rewrite moderation datasets, so the model could actively intervene in toxic conversations while being less judgmental at the same time. Finally, we will discuss the strategies for evaluating the dialogue model and conclude with future directions.
2.Abstract for Jonne Saleva:
Multilingual language models have proven their effectiveness as cross lingual representation learners that perform well on several downstream tasks and a variety of languages, including many lower resourced and zero shot ones. Although effective, MLLMs remain somewhat opaque and the nature of their cross linguistic transfer is difficult to understand. While it seems plausible that higher and lower resourced languages should share information within the model, what is less clear is how such transfer is mediated by linguistic relatedness.
In this talk, we investigate this problem through the lens of diasporic languages which can be crudely understood as a combination of a co cultural language and a co territorial language". Specifically, we ask whether augmenting MLLM adaptation using these ancestral languages, or some mixture of them, can improve MLLM performance on a lower resourced diasporic language, both in terms of perplexity as well as extrinsically on a named entity recognition task. We outline preliminary results on Yiddish, a Germanic language spoken by Ashkenazi Jews, and discuss the effectiveness of using German and Hebrew as ancestral languages. Finally, we contrast regular ancestral pretraining with recent lexicon based adaptation approaches by Wang et al 2022 and conclude with directions for future work.
Biography: 1.Taiwei Shi BIO"
Taiwei Shi is a current summer intern for the Natural Language Group at USC ISI under Professors Jonathan May and Xuezhe Ma. He is also an undergraduate student at the Georgia Institute of Technology, majoring in Computer Science and Mathematics. He has previously worked at Georgia Techs SALT lab under Professor Diyi Yang. He is working towards a career where he can pursue his interests and make an impact in natural language processing, especially in the fields of computational social science and philosophy.
2.Jonne Saleva BIO:
Jonne Saleva is a summer intern in the Natural Language Group at USC ISI, working on language modeling for lower resourced diasporic languages under Prof. Jonathan May. Jonne is also a Ph.D. student in Computer Science at Brandeis University, where he is working on NLP for morphologically rich and lower resourced languages as part of the Broadening Linguistic Technologies Lab led by Prof. Constantine Lignos. Prior to his doctoral studies, Jonne received his M.S. in Computer Science from Brandeis University and A.B. in Statistics from Harvard College in 2017.
Host: Jon May and Thamme Gowda
More Info: https://nlg.isi.edu/nl-seminar/
Audiences: Everyone Is Invited
Contact: Pete Zamar