Font Size: a A A

Research On Image Generation Models Based On Generative Adversarial Networks

Posted on:2024-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2568306932462344Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of deep learning,an increasing number of studies have been using neural networks to generate images.Neural network-based image generation methods can learn the data distribution from a large amount of data,generate realistic images,and can also perform unsupervised learning and diverse image generation.Image generation can be used in data augmentation,medical image analysis,art design,and virtual reality fields,and therefore studying image generation tasks has strong practical significance for many areas of development.As one of the most influential generative models in recent years,Generative Adversarial Networks have achieved great success in image generation problems,thanks to GAN’s ability to continually improve its modeling capabilities through adversarial training and ultimately achieve photo-realistic image generation.In traditional GANs,fully connected neural networks are commonly used,but they are difficult to train.Although replacing fully connected layers with convolutional layers has improved the quality and diversity of images,the receptive field of convolutional neural networks is typically small,which limits their ability to capture global features.There have been other attempts to directly replace convolutions with self-attention modules,but this approach does not fully utilize the spatial information of visual signals.This dissertation proposes a corresponding model structure to solve the problems faced in image generation tasks and to generate higher quality images.The main contributions of this dissertation are as follows:1.A GAN architecture based on Vision Transformers is designed for image generation.A hybrid model of ResNet50 and Vision Transformer is used in the discriminator,which ensures the capture of global features and also has the ability to capture local features.In addition,to overcome the problem of poor generator performance due to the use of the same attention matrix for each channel,an enhanced multi-head attention mechanism is proposed by fixing the depth at different hidden channel dimensions and increasing the number of heads to improve model representation capabilities.Finally,the effectiveness of the proposed structure in image generation is validated on multiple datasets,demonstrating its ability to effectively improve the quality and diversity of generated images.2.This dissertation proposes a hierarchical GAN model based on shifted windows.In response to the issues of the previous model,such as the inability to communicate between various patches and the inability to acquire multi-scale features,the authors improved the original model by utilizing strategies such as shifted windows and window masks.By employing shifted windows and windows partitioning in the generator and discriminator,the computational complexity of attention calculations is significantly reduced,while achieving mutual communication between attention windows.Additionally,shifted windows attention mask is incorporated into the attention module to enable batch attention calculations for attention windows while maintaining the overall semantic information of the feature map.The model was quantitatively and qualitatively evaluated and analyzed in experiments,demonstrating that it improves the image quality and diversity of generated images compared to the previous model.The attention mechanism proposed in this dissertation also exhibit good feature extraction capabilities,which were verified through attention heatmaps.
Keywords/Search Tags:Image generation, Generative adversarial network, Multi-head attention mechanism, Transformer, Feature extraction network
PDF Full Text Request
Related items