Font Size: a A A

Design And Implementation Of Video Generation Based On Audio Drive

Posted on:2022-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y S LiFull Text:PDF
GTID:2518306524493454Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Human face,as the most expressive and personalized external feature of human beings,is the direct carrier that people use to express their emotions in the process of communicating with each other.The combination of facial expressions and lip gestures can convey more effective information,thereby greatly improving people's understanding of language.The synchronization of mouth shape and voice largely determines the pros and cons of face animation,so with the help of factors such as the tone and emotion of the speaker in the audio,a realistic and natural face talking video can be generated.Audio-driven face video generation technology can improve production efficiency in online education and guidance,media news broadcasts,film and television drama editing and correction,large-scale three-dimensional game production,etc.,and improve users' audio-visual experience,saving both producers and users unnecessary Time overhead.Therefore,audio-driven facial video generation technology is currently a research with more practical significance and value.In view of the above background,this article will research and design an audio-driven video generation method.The face-speaking video includes three process modules,they are: 3D face reconstruction based on weakly supervised learning,face image generation based on audio drive,and video generation based on audio drive,so that the generated face talking video has audio-visual synchronization and the effect of real and natural face.This thesis mainly completes the following tasks:1.Research the method of 3D face reconstruction,design and realize the 3D face reconstruction of the input face based on weakly supervised learning.This method does not use additional label information,and uses the detected key point position information of the face as the weakly supervised signal,Through the learning and training of the parameter extraction network,a three-dimensional face image similar to the input image is finally obtained.2.Research the correlation between audio information and facial images,design and implement a mapping network based on audio information to facial expression and posture parameters.This network uses the principle of long-and short-term memory network to make the mapping between parameters more accurate.The obtained facial expression and posture parameters are combined with the saved three-dimensional facial parameters,and finally a facial image containing the target mouth shape is generated.3.Research the matching method between the face image and the background of the video frame,and propose a background frame matching method that can ensure that the natural face pose is contained in the generated video.The partial face pose in the input video can be retained to make the generated face The video frame is natural enough.4.Research the method of face video generation,based on the principle of generating confrontation network,propose a network model and loss function that can perform style transfer for partial images,so that the generated face video frames are natural and realistic,and the generated face videos are smooth and clear.
Keywords/Search Tags:face animation, video generation, confrontation training, audio drive
PDF Full Text Request
Related items