Font Size: a A A

Facial Expression Translation Based On Deep Learning

Posted on:2021-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:S Y WangFull Text:PDF
GTID:2518306050971569Subject:Intelligent information processing
Abstract/Summary:PDF Full Text Request
In recent years,with the development of image translation,facial expression translation has a wide range of applications in fake face generation,facial expression database generation,face editing etc.However,there are three main problems in the existing facial expression translation networks:First,most facial expression translation networks are based on the Generative Adversarial Network,and the unstable training problems in the Generative Adversarial Network also exist in the facial expression translation network;Second,the existing facial expression translation network fails to reconstruct facial details,resulting in the blury patch of the translated image;Third,the traditional facial expression translation network can not capture the important structural information and geometric relationship underlying the face image.To solve these problems,the thesis improves the existing facial expression translation network from three aspects:network normalization,loss function and network structure,aiming to stabilize the translation network training,recover the high-frequency details of the face,and improve the quality of network translation.In order to solve the problem of image distortion caused by the lack of stability of existing facial expression translation network,a facial expression translation network based on spectral normalization is designed.Firstly,the Ensemble of Regression Trees algorithm is used to calculate the face keypoints and construct the face keypoint image to guide the expression translation.Secondly,the loss function of the face expression translation network and the face expression translation network is designed to improve the U-net network and construct the face translation generator using the improved network.Finally,the spectral normalization is introduced into the Markov discriminator.Because the network with spectral normalization obeys Lipschitz condition,the network training is stable and the distortion of the translated image is improved.Experiments on RadBoud dataset show that SSIM can be increased from 0.7649 to 0.7900 and PSNR from 19.2301dB to 19.5268dB by using spectral normalization.Subjective observation experiments also show that embedding spectral normalization in the translation network can effectively suppress the blur and distortion in the face image.In order to solve the problem of insufficient ability of generator to reconstruct facial details,a face expression translation network based on mask is proposed.Firstly,the keypoint image of face is processed by morphology,and the mask dataset is constructed.Secondly,the conditional mask translation network is designed,which takes the mask image as additional condition input to assist the face translation process.Finally,the original face translation loss function is improved,and the mask loss is designed to guide the reconstruction of face details.Because the weight of mask loss in facial region is larger,the ability of facial detail reconstruction is improved.Experiments on RadBoud dataset show that the proposed facial expression translation network can improve SSIM from 0.7649 to 0.7932 and PSNR from 19.2301dB to 19.6556dB.The face SSIM was increased from 0.9946 to 0.9963,and the face PSNR was increased from 36.3474dB to 38.2814dB.Subjective observation experiments show that the mask based method adds more facial details to the translated image.In order to solve the problem that the quality of image translation is not high due to the lack of network modeling for the structure information and geometric relationship of face image,a self-attention facial expression translation algorithm based on multi-scale pooling is proposed.Firstly,the multi-scale pooling module is designed to extract the multi-scale information of the middle feature layer,and then the multi-scale pooling module and self attention module are embedded in the middle feature layer of U-net generator.Due to the combination of multi-scale feature extraction and attention mechanism,the network can capture the structural information of multiple scales in the image,thus improving the quality of network translation.Experiments on RadBoud dataset show that this method can improve SSIM from 0.7649 to 0.7955 and PSNR from 19.2301dB to 19.6248dB.Subjective observation experiments show that this method can capture facial structured information such as wrinkles and improve the overall quality of the translated image.
Keywords/Search Tags:Deep Learning, Generative Adversarial Network, Face Expression Translation, Mask Loss, Multi-scale Attention
PDF Full Text Request
Related items