Image Captioning Based On Attention Long Short-Term Memory Network

Posted on:2021-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Shen

Full Text:PDF

GTID:2428330614953800

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Image captioning translates the image into a natural sentence that human can understand.Different from the tasks about traditional image classification or object detection of images,image captioning is a challenge problem and has extremely broad applications such as early childhood education,image retrieval,and assisting the visual impaired people.For that image captioning not only requires recognize different objects,scenes and attributes in images,but also has to make sense of their relationship.The encoder-decoder frameworks for image captioning have been effectively developed due to the development of deep learning in recently years,and another reason is that the encoderdecoder frameworks are effectively used in the field of machine translation.The main works in this paper are as following:(1)We propose an Attention Long Short-Term Memory Network(ALSTM)for image captioning.In order to deal with the problem that acquisition inaccurate information at every time step in the Long Short-Term Memory Network(LSTM),we propose an ALSTM which uses the previous hidden state to control the input information.And we combine the ALSTM with four classical image captioning architectures.In order to verify the effectiveness of the ALSTM for image captioning,we test our methods on the image captioning datasets.(2)We propose an attention mechanism based on object region for image captioning.By studying the image captioning algorithms,we find that the inclusion of clear semantic object information in each sub-region of the image can improve the accuracy of the image captioning,that is to say whether the object region information in the image can be correctly obtained is fundamental for image captioning.In this paper,based on the ALSTM image captioning,we use the Faster-RCNN to extract object region information and we use the attention mechanism to deal with these object regions.Then we use reinforcement learning method to optimize the training process of the image captioning architectures and several experiments demonstrate the power ability about the reinforcement learning method and object regions features.

Keywords/Search Tags:

image captioning, attention mechanism, Long Short-Term Memory networks, object detection

PDF Full Text Request

Related items

1	Image To Language:Auto Image Captioning Using Bi-directional LSTM And Deep Attention Neural Networks
2	Research And Implementation Of Image Captioning Technology Based On Deep Learning
3	Research On Image Captioning Method Based On Deep Neural Networks And Adaptive Attention Mechanism
4	Research On Image Captioning Algorithm Based On Deep Learning
5	Research On Visual Semantic Graph Construction And Its Application In Image Captioning
6	Image Captioning Based On Adaptive Visual Attention Mechanism
7	Research On Image Caption Method Based On High Level Semantic Extraction And Attention Mechanism
8	Research On Intelligent Semantics Generation For Visual Data
9	Research On Image Captioning Generation Based On Faster R-CNN And Visual Attention
10	Research On Relation Classification Via Bidirectional Long Short-Term Memory Networks With Attention Mechanism