Image Captioning Based On Improved Attention

Posted on:2021-07-14

Degree:Master

Type:Thesis

Country:China

Candidate:Z R Li

Full Text:PDF

GTID:2518306503472094

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Image Captioning involves knowledge from both Computer Vision and Nature Language Processing,it is a challenging task,as it not only asks the model to recognize and detect the objects in an image,but also requires to find their relationships and finally generate a natural,fluent description.This task is very important in research and application.Studying image captioning will take us more close to the ultimate target,true image understanding,and from the aspect of application,image captioning can help the visuallyimpaired people see the environment and improve the efficiency of imagetext retrieval and image labeling.The prevailing method of image captioning is applying attention to the encoder-decoder framework.Researchers has attempted to improve it from all kinds of aspects,for instance,utilizing multiple stages to refine the description,modifying the decoder structure,using better encoder to get semantic features or attributes.Attention has been a commonly used mechanism in artificial intelligence area,in essence,it's just a weighted sum of a set of features,where the weight means the extent of focus,it is simple but could bring huge enhancement,our paper mainly focus on attention in image captioning.We propose two ways to improve attention.We think that the information in previous attention result can guide the attention of next step,so we propose the recurrent attention,including naive recurrent attention,gated recurrent attention and LSTM-based recurrent attention.Meanwhile,we find that in common attention the relation among features is not explicitly considered,hence we propose self-attention to model this relationship and get a representation with global information,and we study the combination of self-attention and common attention.As experiments on MSCOCO dataset suggests,the naive recurrent attention could not utilize the information from previous time step efficiently,the gated recurrent attention can slightly improve the scores of evaluation metrics,but this improvement is limited,LSTM-based recurrent attention could markedly improve the performance of captioning model,and this progress does not come from the 3-layer structure.Self-attention itself could improve the model as an attention mechanism under encoder-decoder framework,the sequential fusion method makes it harder for the model to learn and train,and the parallel fusion method can get the best result,which is competitive with the state-of-the-art model.

Keywords/Search Tags:

Image Captioning, Attention, Recurrent Attention, Self-attention

PDF Full Text Request

Related items

1	Research On Social Image Captioning Based On Deep Learning
2	Research On Semantic-Attentive Deep Image Captioning Method
3	Image Captioning Based On Adaptive Visual Attention Mechanism
4	Research On Image Captioning Generation Based On Faster R-CNN And Visual Attention
5	Research On Image Captioning Based On Visual Attention
6	Research On Image Captioning Algorithm Based On Attention Mechanism
7	Image Captioning Technology Based On Joint Attention Mechanism
8	Research For The Selective Attention Mechanism In The Image Information Processing
9	Design And Implementation Of Image Captioning Model Based On Deep Learning
10	Research On Image Description Generation Based On Visual Attention