Font Size: a A A

Speech-Driven Facial Animation With High Naturalness

Posted on:2020-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:L XiaoFull Text:PDF
GTID:2428330572974416Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Speech animation aims to synthesize the lip-synchronized face animation with a given sequence of speech.Automated speech animation synthesis plays an important role in the modern film,digital games industry and so on,and has a decisive influence on the construction and presentation of virtual characters.In addition,research in cog-nitive psychology has shown that auditory and visual multimodal input can efectively promote the understanding of speech information compared to a single auditory input,and the mismatch between acoustic and visual speech makes people confused or even changed what they have heard.Therefore,the purpose of our dissertation is to present a novel method for speech-driven facial animation with high naturalness and the syn-thesized lip motion with speech.For the 3D face animation system,we design it with the parameterized model and blendshape model.First,a parameterized method is proposed to model the motion oflower part of face based on the selected articulatory control points,which achieves sub-tle control of the chosen area and is able to manipulate the lower teeth to coordinate with the lip.In the meantime,the blendshape method is adopted to edit the facial ex-pressions and other micro-motions.Moreover,the personalized synthesis of blendshape model are combined with the refined control of the parameterized model method to syn-thesize high-quality animation for the arbitrary 3D face model.For the speech-driven articulatory trajectories synthesis,we explicitly divide it into feature extraction,context coding and multi-branch decoding.First,referring to the field of computer vision,we extract the generalized features of the speech sequence through a densely connected convolutional neural network.Then we adopt a bidirectional recurrent neural network to effectively models the phenomenon of phoneme coarticulation.Finally,the multi-branch structure is designed to apply multi-domain learning strategy to improve the precision of predicted trajectories.In order to generate photo-realistic video driven by speech,we decompose the face video into appearance information and shape sequence information.First,the 3D lip motion is synthesized from the speech,then the key points are extracted and merged with the key points of the target face,followed by head pose editing and contour fitting to obtain the shape sequences.Secondly,a fixed reference.image is used to extract appearance information based on the training implemented on designed network structure.Finally,we design the coarse-to-fine generator and adopt the spatial and temporal discriminator to learn the generation of video,then the realistic image sequence can be synthesized based on the shape sequences and reference image.Based on the above research methods,this dissertation establishes a novel and complete speech-driven facial animation system,which can synthesize corresponding 3D face animation based on the given speech,and further generate a realistic video.Extensive experiments indicate that the method we proposed has good practicability.It is speaker-independent and can synthesize the natural face animation of any 3D face models synchronized with the input speech,and the generated face video is able to capture the high realism and temporal coherence while there is no limit to the length of video.
Keywords/Search Tags:3D face animation, parameterized model, blend shapes, speech-driven, articulator, video synthesis
PDF Full Text Request
Related items