Font Size: a A A

Image Captioning Based On Attention Long Short-Term Memory Network

Posted on:2021-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ShenFull Text:PDF
GTID:2428330614953800Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Image captioning translates the image into a natural sentence that human can understand.Different from the tasks about traditional image classification or object detection of images,image captioning is a challenge problem and has extremely broad applications such as early childhood education,image retrieval,and assisting the visual impaired people.For that image captioning not only requires recognize different objects,scenes and attributes in images,but also has to make sense of their relationship.The encoder-decoder frameworks for image captioning have been effectively developed due to the development of deep learning in recently years,and another reason is that the encoderdecoder frameworks are effectively used in the field of machine translation.The main works in this paper are as following:(1)We propose an Attention Long Short-Term Memory Network(ALSTM)for image captioning.In order to deal with the problem that acquisition inaccurate information at every time step in the Long Short-Term Memory Network(LSTM),we propose an ALSTM which uses the previous hidden state to control the input information.And we combine the ALSTM with four classical image captioning architectures.In order to verify the effectiveness of the ALSTM for image captioning,we test our methods on the image captioning datasets.(2)We propose an attention mechanism based on object region for image captioning.By studying the image captioning algorithms,we find that the inclusion of clear semantic object information in each sub-region of the image can improve the accuracy of the image captioning,that is to say whether the object region information in the image can be correctly obtained is fundamental for image captioning.In this paper,based on the ALSTM image captioning,we use the Faster-RCNN to extract object region information and we use the attention mechanism to deal with these object regions.Then we use reinforcement learning method to optimize the training process of the image captioning architectures and several experiments demonstrate the power ability about the reinforcement learning method and object regions features.
Keywords/Search Tags:image captioning, attention mechanism, Long Short-Term Memory networks, object detection
PDF Full Text Request
Related items