Font Size: a A A

Multi-modal Behaviors Data Mining For Virtual Human Synthesis

Posted on:2004-06-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q ChenFull Text:PDF
GTID:1118360185995657Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
To synthesize realistic virtual human multi-modal behaviors (speech, lip motion, face expression and gesture), the synchronization among these multi-modal behaviors is crucial, though the behaviors their self realistic-looking are also expected. This dissertation discusses how to apply and improve data mining method to this key problem in virtual human multi-modal behaviors synthesis. The contribution of the dissertation is as follow:1) On data preprocessing: an mpeg-4 based labeled face feature-tracking method is adopted to obtain audio-visual synchronization data. The method not only has advantage of avoiding the expensive equipments but also has ability of obtaining accuracy data that is in accordance with mpeg-4 standard. In audio-visual synchronization data segment, a new quantitative segment method is proposed that can segment the audio-visual data more simple. In audio-visual data preprocessing, a mepg4 labeled face feature points based face animation parameters generating method is adopted, this method explores possibility of extracting mpeg4 based face animation parameters (FAP) direct from video.2) On data feature extraction: a new mpeg4 based visual speech data feature expression method FAPP (face animation parameter pattern) is proposed. This dissertation demonstrates on how to apply unsupervised clustering and statistic methods to FAPP extraction. Base on a large amount of audio-visual data, 29 kinds of basic FAPP that can describe face motion characteristic and 15 kinds of basic orthodoxy vector that can synthesis FAPP are obtained. The experiment shows that the proposed visual speech feature expression method can effectively realize audio-visual data mapping and vivid face animation.3) On lip synchronization learning: Aiming towards lip synchronization problem in a speech driven face animation system, this dissertation addresses this complex many-to-many learning problem of how to design a learning model that can capture the audio and visual context information as well as real time animation. Two learning methods are proposed in this dissertation. One is FAPP based audio-to–visual neural network mapping method, the other is Parameter Dynamic Transition Network (PDTN) based audio-to-visual real time mapping method. The fore one mainly considers how to realize real time and utilize audio context information. Base on clustering method and correlation frames forward and back, the proposed method can implement the mapping from speech feature vector containing context information to face animation parameter pattern. The later one has more advantage than the fore one. It has considered not only real time and audio context information, but also utilizes the statistic context information of lip motion and expression. The experiment shows our methods are effective which can greatly improve the realistic of lip synchronization in speech driven face animation system.4) On multi-modal behavior data synchronization learning: this dissertation addresses two...
Keywords/Search Tags:Data mining, Machine learning Virtual human synthesis, Multi-modal behaviors, Synchronization, Prosody learning, Face animation, Sign language synthesis
PDF Full Text Request
Related items