| The facial animation is a kind of technique about man-computer interactive, whichsynthesizes facial expression motion through computer. It is a quite active researchdirection in the field of virtual reality, and has been widely used invirtual anchor, videophone, aided teaching, medical research, entertainment, film andanimation production and other fields.The technology of voice driven3D facial lip animation is belong to the domain ofmultimodal human-computer interaction, in simple terms, it use audio files to drive the lipto move, and generate lip motion that synchronize with the speech. This techniqueenriches the content of man-computer interface, and improves the efficiency ofhuman-computer interaction. It also reduces the requirement for network bandwidth whenshare resources of audio and video in real time. Therefore, it draws more and more attentionby researchers both at domestic and abroad.In this paper, I study the technology of lip animation, voice driven and speech signalanalysis and processing. Then I design and implement a realistic3D facial lipanimation system which based on the MPEG-4standard and using the voice file as driversource. The system has the advantages of simple operation, strong commonality and highoperating efficiency. It can meet the needs of real-time voice driven lip animation. Themain work of this paper is as follows.Firstly, I create a universal3D face mesh model. I use the tool names3D MAX toaccomplish this work. Then I use the technology of texture mapping in Direct3D to mapthe face image onto the mesh, and I can obtain a realistic face model. The topologicalstructure of the universal face mesh model is same in general. Therefore, the designeddriven method for the universal mesh model can be used for arbitrary objects.Secondly, set the lip feature points. First of all, I analyze the mutual influence ofvowels and consonants in English pronunciation. Then I summarizethe typical lip movements. In order to make the lip action of control and drive moreaccurate, and compatible with the MPEG-4standard, I define10outer lip feature pointsand8inner lip feature points. Then I construct a lip animation definition table to store thecontrol information of lip feature point for each lip FAP. If we want to control the lip modelfor action, after obtain the value of lip FAP, we seek the lip animation definition table forFAP effect area. And we use the algorithm provided by MPEG-4to calculate the newthree-dimensional coordinate information for all the grid points controlled by the FAP. Imake the superposition for the displacement of each FAP in the same group. Finally I obtain a new image of the lip.Thirdly, extract the speech feature parameter. For the input voice files, wedigitize, pre-emphasis, frame, window, and endpoint detection processing. Wherein, Iuse short-time energy and short-time average zero crossing rate based on double thresholdfor endpoint detection. Then, I extract the speech feature parameter which is MFCC, anduse two-layer Hidden Markov Model to map MFCC and lip animation parameters. We treatthe current speech frame and its previous frame and the after frame as the observationsequence of the first layer mapping model. For the same lip category, we re-cluster thespeech observation within class. Then we get the second layer mapping model, whichmaking the reality of visual speech synthesis improved a lot. As long as extract the MFCCfrom real-time voice, we can use the mapping model to obtain the lip FAP information,thereby drive the lip to move. This method can effectivelysynchronize human sound and lip animation, and enhance the sense of reality of theanimation.Finally, I analyze the functional requirements and work flow of the voicedriven realistic3D facial lip animation system. I use the Microsoft Direct3D SDK andVC++6.0development tools to program, and implement a system which can accept voiceinput in real time and output with realistic lip animation in the face model. |