Font Size: a A A

Image Generation Method Based On Text Data Enhancement

Posted on:2024-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:W H LvFull Text:PDF
GTID:2568306926968049Subject:Engineering
Abstract/Summary:PDF Full Text Request
Text description generation of images has become an important multimodal crossover direction in recent years for research combining natural language processing and computer vision.The task is to generate corresponding description images with the help of known textual information,and to realize the transformation between two information domains,text and image.The task of generating images from text based on generative adversarial networks has become an important research direction today,with a multi-stage generative adversarial network as the main architecture,and a phased generation of images that match the text description from low to high resolution and progressively improved.There are many problems in the process of adopting this architecture for the model,such as long and unstable training process,lack of realism and diversity,and low text-image matching ability.The optimization model in this paper accomplishes the following main tasks to address the deficiencies that arise:(1)To address the inconsistency between input text description and image generation,this paper proposes a text data augmentation network that constitutes a text and image primary stage feature fusion data augmentation structure.By capturing the sentence vector multiple semantic information,we provide updated text feature information parameters to obtain sufficient useful information content and increase the consistency between text and image.(2)To address the problem of a large number of erroneous images in the generated images,ODConv(Omni-Dimensional Dynamic Convolution)is used to replace the conventional convolution of the upsampling layer,residual layer and image generation layer in the original network.A multidimensional attention mechanism is introduced through a parallel strategy to learn more flexible attention to the four dimensions of the convolution kernel space,allowing the convolution operation to have variability across dimensions for the input,providing better performance to capture rich contextual information.The local features of the image associated with the word vector input are dynamically selected to guide image generation,and image generation gets more accurate representation.(3)To address the problem of long model training time,an attention model with shallow parallel network is added to the residual layer,and the model width is increased,resolution and number of branches are effectively scaled.The image feature information is processed at different resolutions,and after fusion in the later stage of the network,the accuracy is guaranteed while the computational efficiency is accelerated.The model is trained and evaluated on the CUB dataset and Oxford-102 dataset,and the three mainstream evaluation metrics of IS,FID,and R-Precision are significantly improved,and the model gets better performance,which is more conducive to the text-generated image task.(4)To address the lack of multimodal dataset of Chinese type in multimodal dataset,we make our own Chinese face dataset based on English face dataset,promote the development of algorithms combining Chinese coding and image coding,replace the BiLSTM text encoder in the network architecture with ALBERT pre-trained Chinese encoder to improve the model to extract Chinese coding ability,Chinese face dataset consists of 30,000 pictures and 300,000 The Chinese face dataset consists of 30,000 pictures and 300,000 Chinese sentences,and the number of pictures and Chinese sentences is 1 picture corresponding to 10 Chinese sentences.The experimental evaluation results from the Chinese face dataset show that the designed improvement method presents good results in terms of authenticity and diversity of the Chinese face dataset.
Keywords/Search Tags:text-generated images, Generating adversarial networks, parallel attention mechanisms, dynamic convolution, text data enhancement
PDF Full Text Request
Related items