Font Size: a A A

Unpaired Learning Based Evaluation Metric For Image Captioning And Adversarial Generating

Posted on:2021-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y L SunFull Text:PDF
GTID:2428330614460443Subject:Software engineering
Abstract/Summary:
Image caption is a task that the computer automatically generates the corresponding complete and smooth natural language description for the image according to the image content and realizes the mapping from image to language.It is widely used in many fields such as image summary,human-computer interaction,help for visually impaired people,automatic medical report and etc.In recent years,the effects of image feature extraction and text automatic generation significantly has been improved,which provides a strong support for image caption task.However,with the performance of conventional image caption task in the traditional evaluation metrics reaching a high level,the research focus of image captioning has shifted to generating sentences which are more vivid and stylized.Firstly,in view of the problem that those traditional n-gram overlap-based evolution methods cannot evaluate the quality of stylized image captions reasonably,we proposed an unpaired learning-based image captioning evaluation method(UICE).Different from existing evaluation methods,UICE directly compares the image features and sentence features to measure whether the caption is semantically consistent with the image content.Additionally,a learning-based grammar module is designed in UICE to measure whether the language expression is correct in grammar.By feeding different stylized datasets into training process,the discriminator can obtain the ability of evaluating different style captions.Extensive experiments indicate that UICE can correctly evaluate semantic consistency and syntactic correctness.On the basis of UICE,we built a new image caption generative adversarial network(GAN)by using UICE as reward.The generator part of this model adopted the traditional Encoder-Decoder model,but in addition to the global image,this model introduced some common sense and prior information of target detection attribute and additional corpus in the generation process.To solve the problem that discrete samples cannot be backpropagated,this model adopted Gumbel sampling in the image caption generation process.In the discriminator,the proposed UICE was used to judge the quality of image caption.Experimental results show that the generated captions of the model can achieve a good level in the description of target attributes,and get a larger vocabulary than the existing models.
Keywords/Search Tags:Image caption, evaluation metric, unpaired learning, generative adversarial network
Related items