Font Size: a A A

Generative Facial Image Synthesis And Analysis

Posted on:2020-03-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:H B HuangFull Text:PDF
GTID:1368330575469014Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Facial image synthesis and analysis is one of the most signifieant research directions in machine learning and computer vision.There has seen a remarkable breakthrough with the development of deep learning.For example,the state-of-the-arts are able to synthesize facial images that confuse humans,and machines outperform mankind in many face recognition scenes.Facial image synthesis and analysis has been widely used in many applications and played a more and more important role in national security and people's livelihood.However,there still exist many challenges,sueh as training difficulty in high-resolution image generation,the limited controllability and diversity in conditional image synthesis,the robustness of face analysis in hard cases,and the restoration of low-quality facial images.To address these problems,we propose several novel approaches for high-resolution facial image synthesis,conditional synthesis and analysis of facial images,and facial image restoration,respectively.The main work in this paper is summarized as follows:1.We present a novel introspective variational autoencoder(IntroVAE)model to synthesize high-resolution photographic images.IntroVAE is capable of self-evaluating the quality of its generated samples and improving itself accordingly.Its inference and generator models are jointly trained in an introspective way.On one hand,the generator is required to reconstruct the input images from the noisy outputs of the inference model as classical VAEs.On the other hand,the inference model is encouraged to discriminate the generated samples from real ones,while the generator tries to fool it as GANs.These two famous generative schemes are integrated in a simple yet efficient single-stream architecture that can be trained in a single stage.Intro VAE preserves the advantages of VAEs,including stable training and niee latent manifold.Unlike most other hybrid models of VAEs and GANs,Intro VAE requires no extra discriminators,because the inference model itself serves as a discriminator to distinguish between the generated and real samples.Experiments demonstrate that our method produces high-resolution photo-realistie images(e.g.,CELEBA images at 1024 × 1024),which are comparable to or better than the state-of-the-art GANs.2.We present two methods based on variational representations for facial image synthesis and analysis.The first one is named disentangled discriminative variational autoencoder(D2VAE),which factors the latent representations into a set of semantic units.Each unit is correlated with a certain attribute label.We minimize the diver-gence of a semantic unit with a given prior distribution when the eorresponding label is positive,and maximize the divergence when the label is negative,so as to associate the semantic unit with a specific label and make the model discriminative and generative.We employ mutual information minimization to further disentangle the semantic units correlated with different attributes.We al50 consider photorealistie as an additional attribute and introduce introspective adversarial learning to improve image quality.Ex-periments show that the proposed model can not only achieve promising performance on facial attribute prediction but also improve the diversity and controllability of high-resolution image synthesis.The second method is a novel approach to make use of the disentangled variational representation(DVR)for cross-modal matching.We model a face representation with an intrinsic identity information and its within-person varia-tions.By exploring the disentangled latent variable space,a variational lower bound is employed to optimize the approximate posterior for NIR and VIS representations.Aim-ing at obtaining more compact and discriminative disentangled latent space,we also impose a minimization of the identity information for the same subject and a relaxed cor-relation alignment constraint between the NIR and VIS modality variations.Extensive experiments demonstrate that the proposed method achieves significant improvements over the state-of-the-art methods.3.We propose three wavelet-based facial image restoration methods,i.e.,Wavelet-domain Super-resolution CNN(WaveletSRCNN),Wavelet-domain Super-resolution GAN(WaveletSRGAN),and Wavelet-domain Deep Zoom Network(WDZNet).The first one learns to predict the wavelet information of HR face images from its corre-sponding LR inputs before image-level super?resolution,which is different from the most existing studies that hallucinate faces in image pixel domain.It is able to capture both global topology information and local texture details of human faces with a flexible and extensible convolutional neural network optimized by a wavelet-domain loss.The second method extends wavelet-domain face hallucination from convolutional neural networks to generative adversarial networks.Two additional losses are employed,one is wavelet adversarial loss aiming to generate realistic wavelets and another is identity preserving loss aiming to help identity information recovery.Extensive experiments demonstrate that it not only achieves more appealing results both quantitatively and qualitatively than state-of-the-art face hallucination methods,but also can significantly improve the recognition accuracy for low-resolution face images.The last method,i.e.,WDZNet,is presented for the general facial image restoration problem.We firstly build a large-scale facial portrait dataset that contains more than 45,000 images of 210 identi?ties,covering different lights,distances,glasses,expressions,ages and racial variations.Then,we propose an end-to-end wavelet-domain deep zoom network that explores the shared information between different wavelet bands to recover texture details.The facial parsing map is also included as the model input to alleviate the unalignment problems.Experimental results show that our WDZNet trained on the collected dataset is able to significantly improve the restoration robustness towards low-quality facial images captured in the wild.
Keywords/Search Tags:Facial Image Synthesis, Facial Image Analysis, Introspective Adversarial Learning, Variational Representation Learning, Wavelet Transform
PDF Full Text Request
Related items