Font Size: a A A

Research On Text Images Based On Generative Adversarial Network

Posted on:2023-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y HeFull Text:PDF
GTID:2568306815462184Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Research in the cross-modal field has been an active area of deep learning research in recent years.One of the most popular cross-modality subfields,text-to-image translation,is today a challenging task that requires algorithms that combine both natural language processing and computer vision modalities.In this paper,the generative adversarial network is used as the basic model to study the cross-modal field of text-generated images.Not only the model needs to be able to understand the information relationship between long and difficult texts,but the decoder needs to decode correctly,understand text semantics and complex background information,and The model needs to be able to train stably and generate a pseudo-image similar to the original image from scratch.This paper improves both the generator and the adversary in the model,and introduces a self-attention mechanism and a contrastive learning algorithm.The main work is summarized in the following two aspects:(1)In the classic generative adversarial network model,if the long and difficult text is only limited to the global feature description of the text,the effect is not good.This paper introduces a self-attention mechanism algorithm based on it.The decoder decodes the sentence global features and word-level features.The text encoder uses the classic two-way LSTM(Long Short Term Memory Algorithm)in the field of natural language processing for encoding,which can extract the semantic vector in the text description and generate it with the global sentence feature vector.A low-resolution first stage image,then the output of the first stage is used as the input of the second stage,and the word feature vector is added to generate a high-resolution image.In the image decoder,CNN(Deep Convolutional Neural Network)is used,so that the local word features in the text sentence and the local sub-regions of the image can be mapped to the same semantic space.In the CNN,the first segment outputs the image.The feature vector of the local sub-region,and the global feature vector of the image is obtained in the subsequent stage,so that the model can be trained more stably.And a deep attention similarity model is proposed to calculate the matching degree and similarity between text and synthetic images,and solve the problem of local association between texts.(2)Although the model after the introduction of the self-attention mechanism has improved the stability of model training and the quality of general text-generated images,there are still defects in the large COCO dataset,even in the CUB dataset.A "malformed" composite image.The problem of semantic association between complex texts remains a challenge.In order to allow the model to learn features better,in the training set,for the same image,both the COCO dataset and the CUB dataset have 5 or10 texts to describe them,and different word descriptions will have great meanings for different scenarios.Differently,this also leads to the poor quality of complex scenes for the attention mechanism model.Therefore,this paper further introduces the method of contrastive learning to enhance the semantic consistency of synthetic images.In the process of using the dataset to train the model,first in the pre-training stage,use the contrastive learning to let the model learn for the same dataset image.textual representation.Then,in the training phase of the main model,the method of contrastive learning is also used to let the model learn the consistent ability of generating similar images for related language titles.Thereby,it solves the generation quality problem of spanning complex text to images consistent with its description in the cross-modal domain.
Keywords/Search Tags:Generative adversarial network, Text generating image, Contrastive learning, Self-attention mechanism
PDF Full Text Request
Related items