Font Size: a A A

Image Attribute Transfer Methods Based On Generative Models

Posted on:2021-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhangFull Text:PDF
GTID:2518306503972629Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Image attribute transfer is an emerging image processing technique,where the goal is to transfer one or more attributes in images according to users' requirements,while ensuring the high quality,fidelity and diversity of generated images.Image attribute transfer can be extended to plenty of applications in different areas including film-making,photo editing,e-commerce etc.In early years,the main form of image attribute transfer is the neural style transfer,which employs a convolutional neural network to train.As conventional convolutional neural networks require definite training objectives,they can not be applied to other image attribute transfer tasks.Afterwards,with the continuous development of generative adversarial networks,an increasing number of image attribute transfer tasks can be accomplished,including image translation,facial attribute transfer,clothing transfer etc.Despite the good results achieved by existing image attribute transfer methods,the quality of generated samples remains to be improved.Therefore,we experiment and explore on facial attribute transfer and clothing transfer,which have the most promising applications among image attribute transfer tasks,and we propose our facial attribute transfer model based on multi-scale features fusion and our fused attention model for clothing transfer.Existing facial attribute transfer methods mostly employ the L1 norm as the reconstruction loss,which not only causes the overall smoothness in images,but also restricts the reasoning of networks.In addition,conventional single scale generators are ineffective in providing enought context information for fine-grained synthesis.To address these problems,we introduce the Multi-Scale Structural Similarity Index(MS-SSIM)as the cycle consistency loss to reconstruct images.MS-SSIM is different from the L1 norm in that it measures the overall structural similarity instead of pixel discrepancy.This manner effectively avoids the smoothness in images and is more aligned with the human visual system.Furthermore,MS-SSIM alleviates the cycle consistency constraint so that networks obtain more freedom to reason and hallucinate.To improve the quality of generated samples,we also develop a multi-scale feature fusion module based on atrous convolutions and incorporate it into the generator.The multi-scale feature fusion module is an effective global contextual prior to fuse multi-scale features,which greatly facilitates fine-grained texture and color synthesis.Moreover,the multi-scale feature fusion module is an efficient feature learning technique because it enlarges the receptive field of networks,dispensing with additional computation and parameters.To address the problems that existing clothing transfer methods are ineffective in capturing long-range correlations and are unable to make full use of semantic information,we propose a coarse-to-fine two-stage generative adversarial network,and incorporate soft attention,self attention and stylized channel-wise attention into the second stage generator.The soft attention layer enhances the correlations between generated images and language descriptions by enabling each location on feature maps to search for the most relevant words in sentences,which effectively reinforces fine-grained textto-image synthesis.The self attention layer serves as a complement to the locality of conventional convolutions and thus explicitly models long-range dependencies in images,which strengthens fine-grained synthesis and global consistency and coherency.The stylized channel-wise attention layer,which establishes feature correlations through the channel-wise inner product and recalibration on feature maps,not only effectively facilitates texture synthesis and delicate colorization,but also encourages reasonable hallucination of the network.In the experimental stage,we choose the Fréchet Inception Distance(FID)for quantitative evaluation and valid the effectiveness of our methods through quantitative experiments.In addition,we also conduct several qualitative experiments and use studies to demonstrate the high quality of our results,which convincingly proves that our methods outperform the stateof-the-art methods.
Keywords/Search Tags:generative adversarial networks, image attribute transfer, multi-scale feature fusion, fused attention mechanism
PDF Full Text Request
Related items