Design And Implementation Of Speech-driven Facial Video Synthesis System

Posted on:2021-10-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:2518306023975299

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The research results of human evolutionary psychology show that people obtain more information from the dual-modal input of speech sequence and facial animation than any one of the single-modal input,and the understanding of information is more effective.Speech animation is a technology that synthesizes facial animation that is consistent or synchronized with the speech sequence.This technology has wide application value in the fields of human-computer interaction,movies,games,etc.It is one of the core technical foundations for generating facial expressions and animations of virtual characters.This dissertation mainly studies the establishment of a mapping model between facial features and speech feature parameters in speech animation,and designs and implements a speech-driven facial video synthesis system.Firstly,a mapping model between facial features and speech feature parameters based on the deep network Bi-LSTM is proposed.The model uses synchronous audio and video dual-modal information for training to obtain the mapping relationship between the speech feature parameters MFCC and the face CLM feature landmarks.Secondly,a speech-driven facial animation generation algorithm is proposed.The algorithm obtains the prediction output of face feature landmarks that drive speech through the well-trained mapping model.Based on this,it sequently combines affine transformation and video coding technology to realize the generation of face animation video.The experiment used about 1000 minutes of weekly radio speech video clips during Obama’s presidency as the training corpus.The experimental results of the mapping model show that the Bi-LSTM-based mapping model proposed in this dissertation is significantly better than the monodirectional LSTM,and inspired by parameters tuning the accuracy of prediction reached to 89.5%.The result of speech-driven facial animation generation experiment shows that the synthesized video has a natural and smooth effect,and the video frame rate reaches 100fps.For the same driving speech input,the average SSE of objective evaluation criteria reached to 9.19.The subjective evaluation criteria on fluency and fidelity of generated videos respectively obtained 7.84 and 8.98 for which the full mark is 10.Finally,according to the aforementioned mapping model and face animation synthesis method,a speech-driven face video synthesis system based on B/S architecture is designed and implemented.The system has good operability,it can be applied to synthesize a natural and synchronized facial video output for arbitrary driven speech.

Keywords/Search Tags:

speech animation, face animation synthesis, speech-driven, Bi-LSTM, video synthesis

PDF Full Text Request

Related items

1	Speech Driven Facial Animation Synthesis Based On Deep Learning Network Model
2	Speech-Driven Facial Animation With High Naturalness
3	Video-driven Three-dimensional Human Body Animation Synthesis Research, Based On Data Mining
4	Research And Realization Of Facial Animation Synthesis
5	Speech Driven Three-dimensional Pronunciation Action Synthesis System Implementation
6	Research On 3D Visible Speech Animation Driven By Prosody Text
7	Research Of Performance-driven Based Facial Animation System
8	Research On Data-driven 3D Facial Animation
9	3d Face Lip Animation Voice-driven Research
10	Research On 3D Facial Animation Synthesis Based On MPEG-4