Font Size: a A A

Image Captioning Based On Generative Adversarial Network

Posted on:2019-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:F LvFull Text:PDF
GTID:2428330548953228Subject:Optical Engineering
Abstract/Summary:PDF Full Text Request
Image captioning aims to generate the corresponding text annotations from the objects and scenes existing in the images.Traditional methods include template-based method,feature-matching-based method and CNN-RNN-based method,may occur the Exposure Bias problem that the generated sentences are still a large gap from real context and lack uniqueness.Therefore,it is difficult to effectively caption the image with the traditional method.In order to overcome the Exposure Bias problem,we introduce the generative adversarial network and leverage its unique adversarial mechanism to generate captions of images.This mechanism can effectively make the generated data fit the true data distribution.However,the direct use of generative adversarial network will cause the generated discrete sequence to be directly input into the discriminator and difficult to propagate the gradient efficiently.In order to effectively analyze the relationship between the image and generate sentence,we introduce the attention mechanism to this paper.At the same time,we regard the image captioning problem as a sequence generation problem based on attention mechanism.From the view of multi-modal analysis,we explore the application of the multimodal(image,text)attention mechanism in image captioning.The main research work in this paper is as follows:(1)Proposing a multi-label image classification method based on attention mechanism.The problem of image captioning can be reduced to multi-label image classification.Firstly,the multiple labels of one image is regard as a sequence,and the image feature is extracted by CNN and RNN is used to predict the multiple labels.In each step of RNN,we predict the possible label with the area of attention.The experimental results show the proposed method can improve the performance in two to three percentage.(2)Proposing an accumulated attention mechanism for multi-modal data.The image captioning problem is also a multimodal problem.In the multi-modal data,the data of all modalities have their key information,which,however,cannot be jointly analyzed.We propose to combine the attention of various modal data effectively and reinforce each other.In this paper,the accumulated attention mechanism is applied to the problem of visual grounding.The experimental results show that the proposed method can effectively improve the effect of grounding in 3% and make the attention information of each modal data have been strengthened.(3)Proposing an attention feedback mechanism that can strengthen attention.The traditional computational process based on the attention mechanism is a one-way propagation operation.This kind of method has the problem of distraction and confusion of generated sentences.In this paper,based on the traditional attention mechanism,a feedback channel is constructed so that the image can ensure that the input and output attention descripted target is matched.Experiments show that the attention feedback mechanism proposed in this paper can effectively improve 2 percent the performance of image captioning in BLEU and METEOR metrics,and make the attention more focused on the key targets in the image.(4)Proposing an image captioning method based on generative adversarial network.We introduce the generative adversarial network to the image captioning based on attention feedback mechanism.The generator of the proposed model adopts the attention mechanism in multi-modal data.The paired attention information of the image and the text is input into the discriminator to judge whether it is true or false to improve the performance of generation.We use the Gumbel-Softmax distribution to relax the discrete output of the generator,which leads to the problem of non-conduction if it was input into the discriminator directly.Experiments show that the proposed method can generate more accurate sentence annotation,and make the generated sentence maintain a certain degree of naturalness.The BLEU and METEOR scores are increased by 2~3%.
Keywords/Search Tags:image captioning, attention mechanism, text generation, generative adversarial network
PDF Full Text Request
Related items