Font Size: a A A

Expressive speech-driven facial animation

Posted on:2006-11-16Degree:Ph.DType:Thesis
University:University of California, Los AngelesCandidate:Cao, YongFull Text:PDF
GTID:2458390005993238Subject:Computer Science
Abstract/Summary:PDF Full Text Request
Facial animation is an essential component of many applications that involve realistic virtual human. However, realistic facial animation remains one of the most challenging problems in computer graphics. In this dissertation, we present a novel approach for automatically synthesizing expressive speech-driven facial animation. Our approach relies on a database of high-fidelity recorded facial motions, which includes speech related motions with variations across multiple emotions. The input of our system is a spoken utterance and a set of emotional tags. These emotional tags can be specified by a user or extracted from the speech signal using a classifier. Our system outputs a realistic facial animation that is synched to the input audio and conveys faithfully the specified emotions.; The contributions of our work are primarily twofold. First, we propose a speech motion synthesis approach that generate realistic lip motion that matches input speech. Second, we propose an emotion mapping approach that allow us to control expressive visual behavior during speech.; We introduce a novel representation of a recorded facial motion database, called the Anime Graph. Given an input utterance, our lip-synching module searches into the anime graph for a matching facial motion, while satisfying a set of proposed criteria. We also present a greedy search algorithm that yields vastly superior performance over most motion-graph based algorithms. The time complexity of the proposed algorithm is linear with respect to the size of an input utterance. In our experiments, the synthesis time for an input sentence of average length is under a second.; To control expressive visual behavior during speech, we propose an emotion mapping approach. First, using independent component analysis, a facial motion can be decomposed into two types of components: emotion (style) and speech (content). We then collect a set of speech related motions that have the same speech content but differ in emotion. By learning from the emotion components of these motions, we build a mapping function that can map a speech-related motion from one emotion space to another.
Keywords/Search Tags:Speech, Facial, Motion, Expressive, Realistic
PDF Full Text Request
Related items