Font Size: a A A

3D Face Reconstruction And Its Applications

Posted on:2022-07-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y D GuoFull Text:PDF
GTID:1488306323980159Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Recently,with the popularity of smart mobile devices and the impact of the epi-demic,people have increasingly relied on digital communication methods such as video chats and video conferences.On the other hand,with the rapid development of artificial intelligence technology represented by deep learning,face recognition is widely used in public security and daily life scenarios such as security and payment.Therefore,the face as a person's digital ID is essential for individuals.For privacy or entertainment considerations,in digital communication,people often need a virtual digital avatar to appear on the screen instead of themselves.The digital avatar can be an exact digital copy of the user himself,or it can be a complete virtual image with no specific rela-tionship with the user.In either case,real-time 3D face reconstruction and tracking are required.The research topic of this thesis is to improve the face reconstruction and tracking technology required in the current digital communication method and develop some facial video editing applications based on this technology to improve the digital communication.The following results have been achieved:Deep learning-based real-time monocular 3D face reconstruction.In recent years,deep learning technology has been widely used in 3D face reconstruction based on a single image,but it is rarely used in real-time 3D face tracking from video input.This paper designs a novel network architecture-3DFaceNet.which reconstructs 3D human faces in real-time from monocular video.The core of this algorithm is to synthesize a large-scale and realistic face image data set with 3D labels based on inverse rendering.And we train 3DFaceNet with such dataset.3D face reconstruction and model learning from diverse sources.The existing 3D face reconstruction algorithms basically rely on a parametric face model.Such face model is generally trained from a single data source,such as high-quality 3D scan data or 2D facial images.Although 3D scanned data contain accurate geometric information of face shapes,the capture system is expensive and such datasets usually contain a small number of subjects.On the other hand,in-the-wild face images are easily obtained and there are a large number of facial images.However,facial images do not contain explicit geometric information.In this paper,we propose a method to learn a unified face model from diverse sources.Besides scanned face data and face images,we also utilize a large number of RGB-D images captured with an iPhone X to bridge the gap between the two sources.Experimental results demonstrate that with training data from more sources,we can learn a more powerful face model.Real-time face view correction for front-facing cameras.The line of sight plays an extremely important role in communication between people.In order to get a good experience in digital communication such as video conferencing,it is better for users to face the front camera.However,in the existing video conference system,for one-to-one video conferences using mobile devices or laptops as terminals,users tend to look at the screen to observe the other person's face.Since the general camera is placed outside the screen,the user often does not look directly at the camera.To tackle this issue,this paper proposes to use the above face reconstruction technology to achieve real-time face view correction by rendering the corrected 3D face.Audio driven neural radiance fields for talking head synthesis.Generating high-fidelity talking head video by fitting with the input audio sequence is a challeng-ing problem that receives considerable attentions recently.Most of the existing algo-rithms based on generative adversarial networks or neural rendering need to establish a voice-to-video correspondence with intermediate modalities such as facial keypoints or expression coefficients,so there may be information loss.This paper proposes to use the recently proposed neural radiance fields to directly learn the cross-modal mapping from semantic audio signal to talking head video.Based on this model,the algorithm supports talking head video generation with arbitrary audio input.In addition,the vol-ume rendering-based representation provides a natural way to freely change the pose of speaking persons.This feature is extremely useful for applications such as virtual video conferencing.
Keywords/Search Tags:Face Reconstruction, Digital Communication, Face View Correction, Audio Driven Face Synthesis, Neural Rendering
PDF Full Text Request
Related items