Research On Image Description Generation Based On Visual Attention

Posted on:2021-05-30

Degree:Master

Type:Thesis

Country:China

Candidate:K X Fan

Full Text:PDF

GTID:2428330623468548

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer vision and natural language processing,the cross tasks of these two fields have also attracted more and more researchers'attention.This paper mainly studies image captioning in visual captioning.This task can be applied to many actual scenes,for example:helping the visually impaired,improving the accuracy of image retrieval,and assisting human-computer interaction.The definition of image captioning is:given a picture,the model is required to automatically generate a natural language description.Most traditional image captioning models use encoder-decoder structure combined with an attention mechanism.The framework has achieved great results,but there are still some problems.First of all,in traditional models,the text descriptions generated by the decoder are regarded as the final result.These methods lack the process of deliberation.Secondly,exposure bias exists in the encoder-decoder structure.Finally,traditional mod-els only focus on the accuracy of generating captions.As a result,there may be situations where the text descriptions of pictures with similar content are the same.In order to solve the above problems,this paper designs an image description gener-ation system based on the deliberation attention mechanism.The system proposed in this paper consists of three parts.Firstly,the model implements the process of deliberation with two layers of the resid-ual attention mechanism.The first-pass residual-based attention layer prepares the hidden states and visual attention for generating a preliminary version of the captions,while the second-pass deliberate residual-based attention layer refines them.The model generates more accurate descriptions by introducing the deliberation process.Besides,this paper combines the cross-modal retrieval method and reinforcement learning.to solve the problem of low discriminability of traditional image captioning models.The reinforcement learning module can alleviate the problem of inconsistent data flow during training and testing and the problem of exposure bias.Finally,the experimental results of our proposed model on MS-COCO and Flickr30K datasets exceed the latest results.Specifically,the model improves the state-of-the-art on the MSCOCO dataset and reaches 37.5%BELU-4,28.5%METEOR and 125.6%CIDEr.It reaches 29.4%BLEU-4,66.6%CIDEr on the Flickr30K dataset.

Keywords/Search Tags:

computer vision, natural language processing, image captioning, attention mechanism, deliberation

PDF Full Text Request

Related items

1	Research On Video Captioning Based On Deliberation Mechanism
2	Research On Image Caption Generation Based On Deep Learning
3	Research On The Generation Method Of Chinese Image Description Based On Dual Attention Mechanism
4	Image Caption Based On Multimodal Attention
5	Research On Image Caption Algorithm Based On Fusion Of Multi-attention Mechanism
6	Image Caption Generation Based On Attention Mechanism
7	Research On Visual Question Answering Method Based On Attention Mechanism
8	Resaerch And Implementation Of Image Captioning Algorithm With High-level Semantics Based On Deep Learning
9	Multimodal Natural Language Generation For Human-computer Interaction
10	Image Captioning Based On Adaptive Visual Attention Mechanism