Font Size: a A A

Research On Image Captioning Method Based On Deep Neural Networks And Adaptive Attention Mechanism

Posted on:2021-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:D L LiangFull Text:PDF
GTID:2428330620969913Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Image captioning combines two fields of computer vision and natural language processing,which is a very challenging research task.The task aims to allow computers to automatically generate a descriptive text for an image.Compared with the traditional image captioning methods,neural network-based image captioning ones are more efficient and can generate more natural sentence descriptions for an image.This paper combines the deep neural networks and the attention mechanisms to develop an efficient image captioning algorithm.The main research work and contributions are as the follows:(1)An image captioning model based on long short-term adaptive attention is proposed.The traditional image captioning model based on attention mechanism usually combines the attention mechanism with long short-term memory networks and adjusts the attention of the model according to the hidden state of long short-term memory networks.However,due to the limited information stored in the hidden state,it is difficult for the model to locate the image region that has a high correlation with the current moment without sufficient information as a reference.In response to this problem,this paper proposes an image captioning model based on long short-term adaptive attention.This model uses the hidden state and memory unit state of the long short-term memory networks to guide the two attention modules respectively,and connects them through the adjustment factor,so that the model can refer to both information at the same time to infer which areas of the image should be paid attention to at the current moment.Through corresponding experiments and comparison with mainstream image captioning models,the validity of the proposed model is verified.(2)Based on our work introduced in this paper,considering that the weighted image features generated by the attention module will change at each moment while the word is under generating,if it is input into the long short-term memory networks together with the word vector,it is not good for the long short-term memory networks learning text sequences,so a method of using global features of images,instead of weighted image features to input to long short-term memory networks,is further proposed.The related experimental results show that the improved model can further improve the performance of the model.
Keywords/Search Tags:convolutional neural network, long short-term memory networks, adaptive adjustment, attention mechanism
PDF Full Text Request
Related items