Research On Speech Driven Talking Face Video Generation

Posted on:2022-07-20

Degree:Master

Type:Thesis

Country:China

Candidate:W T Wang

Full Text:PDF

GTID:2518306542466684

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

Speech driven talking face video generation means to generate talking face video using any audio and any head image of a given person.This technology has been widely used in movie making,virtual news broadcasting,virtual speech,etc.At present,the research on speech generation of talking face video mainly focuses on the quality of facial synthesis and the accuracy of lip movement,but neglects the speaker's head motion.However,the structural similarity of the key points is one of the main factors that affect the accuracy of lip movement in the face synthesis with the face landmarks as the intermediate variables.In the previous studies,the head motion synthesis details of talking face video synthesis were not satisfactory.In order to solve the above problems,this paper proposes a method using facial landmarks as the intermediate variables to generate natural head movements,accurate lip movements and high-quality facial expressions.The quantitative and qualitative results show that the proposed method can synthesize clear,natural,and head-motion speaker facial video,and its performance is better than the existing methods.The main contents and innovations of the thesis are as follows:First,a speech split neural network with face landmarks as intermediate variables is studied.The speech information is decomposed into head motion and semantic information by convolution neural network and cyclic neural network.By separating head landmarks and the lip landmarks,the head motion information and the semantic information in the input speech correspond to the face contour landmarks and the lip contour landmarks respectively.Secondly,a loss function is studied to optimize the accuracy of facial landmarks.The function can dynamically adjust the loss of facial landmarks during the training.This method solves the problem of underfitting caused by the similarity of key points in face structure,and ensures that the network can still be trained stably in the training process.Thirdly,a talking face video generation network is studied,which synthesizes face video through continuous lip landmarks sequence and head landmarks sequence and template images.Based on this,the channel attention mechanism is introduced,so that the network can get more accurate head motion information and the semantic information of lip landmarks through the attention mechanism.

Keywords/Search Tags:

Talking Face, Facial Landmark, Lip Motion, Head Motion, Face Video

PDF Full Text Request

Related items

1	Multi-core Accelerated Acquisition Of Human Face And Facial Features From Video Sequence
2	Facial Landmark Localization Under Large Pose Variation
3	Text/Speech-Driven Talking Face Generation With High Naturalness
4	Research On Face Recognition Under Unconstrained Conditions
5	Facial Motion Capture Based On Sparse Representation And Cascaded Regression
6	A Method Of Face Detection Based On Facial Landmark
7	Research On Real-time Face Landmark Detection And Tracking Technology Based On Video
8	Key Techniques Of Content-based Intelligent Video Surveillance And The Applications In Public Security
9	Research Of Facial Landmark Location And Face Recognition
10	Facial Landmark Detection Via Multi-task Feature Selection And Self-adapted Model