-
PhD Defense - Chung-Cheng Chiu
Tue, May 13, 2014 @ 12:00 PM - 02:00 PM
Thomas Lord Department of Computer Science
University Calendar
Title: Generating Gestures from Speech for Virtual Humans Using Machine Learning Approaches
PhD Candidate: Chung-Cheng Chiu
Committee:
Stacy Marsella (Chair)
Jonathan Gratch
Louis-Philippe Morency
Ulrich Neumann
Stephen Read (outside member)
Time: 12pm
Location: EEB 248
There is a growing demand for animated characters capable of simulating face-to-face interaction using the same verbal and nonverbal behavior that people use. For example, research in virtual human technology seeks to create autonomous characters capable of interacting with humans using spoken dialog. Further, as video games have moved beyond first person shooters, there is a tendency for gameplay to comprise more and more social interaction where virtual characters interact with each other and with the player's avatar. Common to these applications, the autonomous characters are expected to exhibit behaviors resembling a real human.
The focus of this work is generating realistic gestures for virtual characters, specifically the coverbal gestures that are performed in close relation to the content and timing of speech. A conventional approach for animating gestures is to construct gesture animations for each utterance the character speaks, by handcrafting animations or using motion capture techniques. The problem with this approach is that it is costly in time and money and is not even feasible for characters designed to generate novel utterances on the fly.
This thesis address using machine learning approaches to learn a data-driven gesture generator from human conversational data that can generate behavior for novel utterances and therefore saves development effort. This work assumes that learning to generates from speech is a feasible task. The framework exploits a classification scheme about gestures to provide domain knowledge about gestures and help the machine learning models to realize the generation of gestures from speech. The framework is composed of two components: one realizes the relation between speech and gesture classes and the other performs gesture generation based on the gesture classes. To facilitate the training process this research has collected a real-world conversation data involving dyadic interviews and a set of motion capture data for human gesturing while speaking. The evaluation experiments assess the effectiveness of each component by comparing with state-of-the-art approaches and evaluate the overall performance by conducting studies involving human subjective evaluations. An alternative machine learning framework has also been proposed to compare with the framework addressed in this thesis. The evaluation experiments have shown the framework outperforms state-of-the-art approaches.
The central contribution of this research is a machine learning framework that capable of learning to generate gestures from the conversation data collected from different individuals while preserving the motion style of specific speakers. In addition, our framework will allow the incorporation of data recorded through other media and thereby significantly enrich the training data. The resulting model provides an automatic approach for deriving a gesture generator which realizes the relation between speech and gestures. A secondary contribution is a novel time-series prediction algorithm that predict gestures from the utterance. This prediction algorithm can address time-series problems with complex input and be applied to other applications for classifying time series data.
Location: Hughes Aircraft Electrical Engineering Center (EEB) - 248
Audiences: Everyone Is Invited
Contact: Lizsl De Leon