Font Size: a A A

Research On Image Caption Via Incorporating Attention And Long Short-Term Memory Network

Posted on:2021-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y D YouFull Text:PDF
GTID:2428330611963214Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Image captioning is a multi-modal artificial intelligence technology combing machine vision and natural language processing.It enables the machine to generate sentences describing semantic content of a image,which has extensive application value in the construction of intelligent transportation and smart city.The traditional template-based and retrieval-based caption methods make the captions not flexible enough,and the limitations are obvious.The encoder-decoder framework built by convolutional neural networks and recurrent neural networks provides complete solution for image captioning tasks.It is increasingly favored by researchers.However,the existing deep learning-based methods also have the problems of low accuracy,slow training speed,and low scores on the evaluation metrics.To solve those problems,this article proposed methods and strategies.The research content and innovative work of this article include:1)We proposed an image captioning generation model based on InceptionResNet-V2 and convolutional attention mechanism.In the framework of encodingdecoding caption model fused with image attention mechanism,we use the InceptionResNet-V2 as feature extracted network to improve the model's ability of extracting image features and make the model more explicit when generating language descriptions.To solve the problem of slow training speed,we use full convolution operation in the traditional image attention mechanism which replaced the full connection operation.It reduced a large number of model parameters.2)We proposed an image captioning generation model based on residual connection and language attention mechanism.Relevant research show that using the dual-layer LSTM with attention mechanism in the decoding part can enhance language model's ability to generate captions.However,the double-layer LSTM has large number of parameters and deeper layers,which makes gradient disappear easily.To solve this problem,this paper uses residual connections between two layers of LSTMs which increased the relevance of language word vectors.In the feature extracted phase,this article use the object detection network to extract image features,so the model can notice the main areas of the image at the the beginning.This paper also designs an attention mechanism based on language features which improve the performance of the language model.Finally,this article uses reinforcement learning strategies to optimize the models.In the reinforcement learning strategy,we score the scores on the CIDEr as a reward in language generation model by using the greedy algorithm to optimize the above two models.In summary,this article has studied and improved the image captioning algorithm based on deep learning.Experimental results show that the image captioning algorithm proposed in this paper can effectively improve the performance of the model and generate the image captioning sentence more accurately than traditional methods.
Keywords/Search Tags:Attention mechanism, Image caption, Convolutional neural network, Long short-term memory
PDF Full Text Request
Related items