Font Size: a A A

Research On Image Caption Method Based On Attention Feedback Mechanism

Posted on:2020-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y DengFull Text:PDF
GTID:2428330572461803Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of society and the continuous popularization of Internet technology,hundreds of millions of images are transmitted every day,and image information is digitally stored on the Internet.Image has become an important medium for transmitting network information.Effectively managing and identifying the images needed is an important and highly relevant research topic.The goal of the image caption task is to extract the feature information of the image and generate a statement caption that approximates the real context.Traditional image caption methods include semantic template filling method,feature space matching method and CNN-RNN.The statement caption generated by these methods has the Exposure Bias problem,which still has a large gap from the real context and lacks uniqueness.The operation process based on the attention mechanism is a one-way propagation operation,which has the problem of distraction of attention and disorder of generated statements.In order to solve the problem of traditional image annotation,this paper designs a stacked network structure to process attention information.In order to effectively analyze the relationship between the image and the generated text,a text feedback structure based on the attention mechanism is introduced,which effectively ensures the matching of the input and output attention description information,and makes the generated statement more accurate.The main research work of this paper is divided into the following points:(1)In-depth study of sequence generation methods based on attention mechanism.Image caption is essentially an image multi-label classification problem,and multiple labels of an image can be considered as a short sequence.The method uses CNN to extract image features,and then inputs the feature information into the LSTM.During the prediction process,the attention mechanism focuses on different image regions at different times of LSTM decoding,and predicts the words in the image region.The experimental results show that the method is 3%~4% better than the CNN-RNN based method.(2)An image caption model based on attention feedback mechanism is proposed.The model uses an encoder-decoder framework,and the encoder uses CNN to extract feature information of the image.The decoder part is designed with a stacked network structure Q-LSTM,which handles attention information from top to bottom,so that each layer of the network can obtain additional feature information.Then,the attention feature on the text is extracted from the generated text,the attention feature is fed back to the attention area of the image,the attention area in the image is iteratively corrected,the key information matching of the image and the text is strengthened,and the generated statement is optimized.Experiments were carried out on data sets such as Flickr8 k,Flickr30k and MSCOCO.The experimental results show that the recognition rate of the proposed model is 5%~9% higher than that of the classical image caption model.(3)Design and implementation of image caption system.The image caption system is designed based on the image caption model based on the attention feedback mechanism proposed in this paper.The system mainly includes image caption module and image caption record query module.It shows in an interface form that the model proposed in this paper can effectively describe the information in the image and generate reasonable statement caption.
Keywords/Search Tags:deep learning, cyclic neural network, image caption, attention mechanism, convolutional neural network
PDF Full Text Request
Related items