Font Size: a A A

Research On Adversarial Learning Pluralistic Image Inpainting Algorithm Based On Transformer And Mask Prediction

Posted on:2024-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:J P WuFull Text:PDF
GTID:2568306929994639Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The goal of image inpainting is to fill in the missing areas in an image with known contextual information,and the filled image should have both reasonable semantics and clear texture.Most of the existing traditional algorithms based on diffusion or patching are only suitable for restoring images with small area or simple texture structure,and when the missing area in the image is large or the texture structure is complex,the restored image will have problems such as distorted structure and blurred details.In recent years,deep learning-based image inpainting algorithms have been able to make the restored image texture structure coherent and complete,mainly due to the powerful feature extraction ability of convolutional neural networks.Following the emergence of encoder networks and generative adversarial networks,more new inpainting frameworks have been provided for image inpainting,but these inpainting frameworks are still inadequate for solving the problems of low semantic consistency and blurred details of restored images caused by the existing convolutional modules that only extract features from a single scale perceptual field.Therefore,this thesis addresses the above problems by making improvements to the generative and discriminative networks under the framework of generative adversarial networks,and carrying out research on face image inpainting algorithms regarding mask prediction and multi-scale contextual aggregation.In addition,this thesis combines the idea of fusion learning,considering the importance of global context information for long-term modeling during image inpainting and the user demand of diversified inpainting,and carries out research on the diversified image inpainting algorithm with improved Transformer and fusion learning,and carries out inpainting in CelebA-HQ face dataset and Paris StreetView street view dataset by The inpainting results of different inpainting algorithms are analyzed with the impact of each module on the inpainting effect,as follows:(1)A face image inpainting algorithm based on mask prediction and multi-scale context aggregation is proposed for the problems of low semantic consistency and blurred details of the restored images when existing generative adversarial networks repair irregularly broken face images.To improve the semantic consistency of the restored images,the generative network is improved by proposing a multiscale context aggregation generative network,which enhances contextual inference by stacking 8-layer multiscale context aggregation modules to capture distant contextual features and rich patterns of interest.Then the output of each convolutional layer of the encoder and the input at the corresponding position of the decoder are stitched in the channel dimension using a jump connection,which makes the contextual information of the image propagate to higher resolution feature maps.Finally,the extracted feature maps are transformed to the real image resolution size by the decoder.The multi-scale context aggregation generation network is trained by the joint loss function of reconstruction loss,adversarial loss,perceptual loss,and style loss;to generate clear textures,the discriminative network is improved and the mask prediction discriminative network is proposed.This discriminant network is used to distinguish texture details of small blocks(missing regions)of the real image and small blocks of the restored image.For the restored image,the discriminant network expects to segment the synthetic image chunks from the real image.Such a learning goal leads to a stronger discriminative network and in turn facilitates the generative network to repair clear fine-grained textures.A comparative qualitative and quantitative experimental analysis of four benchmark models and this thesis’s model demonstrates that the proposed algorithm improves both visual perception and quantitative evaluation metrics.(2)To address the problem that the existing Transformer model generates semantic ambiguity in image inpainting and the image inpainting results are homogeneous when the existing Transformer model restores irregularly missing images because it obtains global contextual information for long-term modeling through the attention mechanism without considering the distinction between invalid pixels and valid pixels,so this thesis proposes a pluralistic image inpainting algorithm based on improved Transformer and fusion learning is proposed in this thesis.In this thesis,a Transformer module(multi-head contextual attention module)is designed for image inpainting to build the backbone network,and the multi-head contextual attention module is used to model the global contextual information for a long time and calculate the attention score only for valid pixels,which effectively solves the semantic ambiguity caused by invalid pixels;in addition,the style transfer module and fusion learning are used to In addition,we use style transfer module and fusion learning to generate diverse restored images.Through relevant experiments,it is demonstrated that the proposed algorithm can be used not only for restoring face images but also for restoring natural street scenes,and it can generate pluralistic inpainting results.(3)In this thesis,we combine and optimize the above two research elements and propose an adversarial learning pluralistic image inpainting algorithm based on Transformer and mask prediction.The effectiveness of the model is verified in this thesis on public face dataset.
Keywords/Search Tags:Image Inpainting, Generative Adversarial Network, Multi-scale Contextual Aggregation, Mask Prediction, Transformer, Pluralistic Image Inpainting
PDF Full Text Request
Related items