Fri, Jun 04, 2021 @ 09:00 AM - 11:00 AM
Thomas Lord Department of Computer Science
PhD Candidate: Emily Sheng
Time: June 4th, 2021, 9AM-11AM
Title: Fairness in Natural Language Generation
Committee: Prof. Prem Natarajan (chair), Prof. Nanyun Peng (chair), Prof. Shri Narayanan, Prof. Yan Liu
Zoom link: https://usc.zoom.us/j/99069448766
Technology for natural language generation (NLG) has advanced rapidly, spurred by advancements in pre-training large models on massive amounts of data and the need for intelligent agents to communicate in a natural manner. While techniques can effectively generate fluent text, they can also produce undesirable societal biases that can have a disproportionately negative impact on already marginalized populations. In this presentation, I emphasize the need for techniques to make language generation applications more fair and inclusive, and further propose a few of these techniques.
The first half of this presentation introduces the problem of societal biases in NLG and how we can use existing and novel quantitative measures as metrics to quantify biases in language generation. I start by introducing a survey and commentary on the existing body of work on fairness in language generation. To meaningfully iterate on techniques that can reduce biases in language generation, I introduce the notion of the regard towards a demographic, use the varying levels of regard towards different demographics as a defining metric for bias in NLG. Through this and other metrics, we can reveal the extent of the biased nature of language generation techniques.
With the analysis and bias quantifiers introduced in the first half, the second half of this presentation focuses on methods to reduce societal biases in NLG techniques. I focus on two methods for controllable generation. The first method builds upon the idea of adversarial triggers to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. The second method is a constrained decoding technique that uses salient n-gram similarity as soft constraints for top-k sampling. We introduce this second method in the context of reducing the disproportionate amount of harmful ad hominem responses faced by marginalized populations in dialogue generation.
WebCast Link: https://usc.zoom.us/j/99069448766
Audiences: Everyone Is Invited
Contact: Lizsl De Leon