-
PhD Thesis Defense - Sarik Ghazarian
Wed, Aug 23, 2023 @ 04:00 PM - 06:00 PM
Thomas Lord Department of Computer Science
University Calendar
PhD Thesis Defense - Sarik Ghazarian
Committee Members:Aram Galstyan, Nanyun Peng, Kallirroi Georgila, Gaurav Sukhatme, Morteza Dehghani
Title: Automatic Evaluation of Open Domain Dialogue Systems
Abstract: With the rapid development of open domain dialogue systems in recent years, it is imperative to have precise evaluation metrics that correctly assess the quality of these systems. To this end, many researchers resort primarily to human evaluation which is time consuming, expensive and it does not facilitate the model comparisons across research papers. Therefore, the existence of accurate automatic evaluation metrics that can accelerate the development cycle by assisting the process of architecture search and hyperparameter tuning is necessary. Reference based metrics such because BLEU or ROUGE fail to correlate well with human judgment in open domain settings as there can be potentially many plausible generations that do not overlap significantly with the limited set of given references. This failure leads the research towards learning based evaluation metrics that are more sophisticated and reliable.
Automatic evaluation of open domain dialogue systems has a multifaceted nature with many fine grained quality aspects. This dissertation explores both turn level and conversation level facets of open-domain dialogue evaluation. We train models that automatically assess the relevance, engagement, coherence, and commonsense aspects of the responses generated by dialogue models. We formulate the evaluation as a classification task to identify the quality of the responses. To this end, we focus on training data and model architecture of these metrics as two main components that metrics quality strongly relies on them. We start with heuristic text level manipulations such as random swapping of utterances to create negative samples for training evaluation metrics. Then, we show that such manipulations are insufficient to appropriately reflect the issues that occur in interactions between advanced dialogue models and human. To tackle this issue, we move forward toward proposing advanced semantic level perturbations of human written responses to generate challenging negative responses that are more likely to be generated by state of the art dialogue models. Next, we complete our investigation on dialogue evaluation by concentrating on the model architecture of these metrics by incorporating knowledge from knowledge bases and leveraging prompt based generative models in a low resource setting. Finally, in addition to dialogue assessment, the main goal of automatic evaluation metrics, we leverage them as influential control factors to guide dialogue models and generate higher quality responses.
Audiences: Everyone Is Invited
Contact: Melissa Ochoa
Event Link: https://usc.zoom.us/j/97105095544?pwd=Q05tWTdLSFdhNS9EY2JRMklWbHRkUT09