Font Size: a A A

An Ensembled Generation-retrieval Method For Image Captioning

Posted on:2021-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:C P XuFull Text:PDF
GTID:2518306107968529Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Image captioning is one of the important research fields of scene understanding.Image captioning,which aims at generating a natural language description of an image,is a challenging task.It connects computer vision with natural language processing communities,and not only captures the objects and their complex relationships in an image,but also has the linguistic capability to describe what it sees,which has broad application prospects in image retrieval,intelligent human-computer interaction,and assistance for people with vision impairment.The traditional image caption methods use retrieval-based methods to find suitable sentences from the pre-constructed image-caption repository as the language description for the target image.Although the retrieved sentences are fluent and do not contain grammatical errors,the sentences are limited by the capacity of the pre-constructed repository,which may be not tailored appropriately for the query image.In recent years,benefiting from the rapid development of deep learning,the generation-based image captioning methods using the Encoder-Decoder framework have made great progress,which can generate sentences freely and flexibly.But there are still some defects.For example,the generated captions often lack language fluency,diversity,and informativeness.Aiming at the above problems,the thesis combines the retrieval-based methods and the generation-based methods to explore and study the automatic image description algorithm.The main works and contributions are as follows:A novel generative adversarial learning framework for image captioning is proposed.This framework combines retrieval-based image captioning methods and generation-based image captioning methods.The discriminator in the generative adversarial network takes the retrieved sentence as a reference,distinguishes the sentence generated by the generator from the human-annotated sentence,and judges the matching degree between the sentence and the image.With the help of retrieved sentences,the discriminator can make a better judgment on the quality of the sentence and pass the accurate sentence score to the generator,thereby improving the quality of the generated sentence.In order to increase the fluency and informativeness of the generated captions,the thesis introduces the copying mechanism widely used in natural language processing into image captioning algorithms.The model can automatically copy the appropriate words from the retrieved sentences to the generated sentences through the copying mechanism.In addition,the semantic information contained in the retrieved sentences is used to enhance the existing attention mechanism,guiding the model to generate a more appropriate caption.To verify the performance of the model proposed in the thesis,experiments are performed on the widely used COCO image captioning datasets.By the ablation experiments,the effectiveness of each module of the model is verified.Through the comparison with other advanced image description models(i.e.,Soft-Attention,AdaptiveAttention,SCST,Stack Cap,Up-Down,DHEDN,etc.),the model in the thesis has made great improvements on image captioning metrics(i.e.,BLEU,METEOR,ROUGE,CIDEr),which proves the effect of the model.
Keywords/Search Tags:Image captioning, Generative adversarial network, Copying mechanism, Image retrieval
PDF Full Text Request
Related items