Font Size: a A A

Research On Image Caption Method Based On High Level Semantic Extraction And Attention Mechanism

Posted on:2022-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:S H WangFull Text:PDF
GTID:2518306320475454Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the internet and artificial intelligence,researchers are paying more and more attention to image captioning tasks,especially in the fields of human-computer interaction and children's education.By combining of feature extraction in computer vision and sequence output in natural language processing.Image captioning aims to describe the image content in natural language using a computer,and to complete the transformation from vision to language.Different with target detection,image recognition and other image understanding tasks,image captioning needs to not only identify the objects contained in the image,but also accurately identify the background in the image,and the relationship between the objects and other information.In recent years,scholars have made breakthrough developments in the research of image captioning algorithms based on deep learning.This paper focuses on the image captioning algorithm based on the encoder-decoder framework.First of all,this paper has found that most of the current image captioning methods for image feature extraction methods use global image features,which leads to a single scale of extracted image information,and makes the content of image captioning not comprehensive enough.This paper proposes an image high-level semantic extraction method based on the region network.First,global image features are extracted through the VGGNet-16 network,and then local feature information is extracted through the region suggestion network.Finally,the high-level image semantics of global features and local feature information are combined to guide the decoder generation image captioning,so that the image captioning model can grasp the overall information at the same time,and makes the image captioning more comprehensive from the local information.Then,this paper studies the application of Attention mechanism in the decoder.In fact,the traditional Attention mechanism is relatively simple,the decoder directly uses the useful information and redundant information obtained from the encoder as the information input.However,the input of redundant information will mislead the output result and generate an incorrect image captioning.This paper proposes an image captioning method based on Spatial and Visual Attention.The Spatial Attention is improved to strengthen the correlation between attention results and queries.The Visual Attention mechanism allows the model to automatically focus on image visual signals and language models.The decoder uses improved Spatial Attention and Visual Attention mechanisms to fuse the hidden layer in the Long and Short-Term Memory network and the acquired image context information to obtain the final image captioning language.Finally,the model in this paper is trained on the MS COCO dataset and Flickr30 K dataset,and the trained model is tested and a variety of evaluation indicators are evaluated.Compared with previous researches,the experimental evaluation scores show that the image captioning method based on high-level semantic extraction and attention mechanism proposed in this paper is better than other comparative image captioning methods.
Keywords/Search Tags:Image Captioning, High-level Semantics, Attention Mechanism, Long and Short-Term Memory, Encoder-Decoder
PDF Full Text Request
Related items