Font Size: a A A

Research On Image Caption Method Based On High-level Image Semantic And Attention

Posted on:2019-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:C FangFull Text:PDF
GTID:2428330545954781Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent decades,computer technology has developed at a high speed,machine learning and other technical theories have also been continuously improved.Neural network technology has achieved remarkable research results in various fields.The description of the automatic generation of an image by a computer is a hot and difficult point in the current computer research field.Its essence is the use of computers to detect and identify objects in the image,and can perceive the scene of the image and the content of the scene.Compared to image detection and object classification in images,this task involves two major areas of computer vision and natural language processing.When the computer performs image description,it should not only pay attention to the individual objects and categories in the image but also pay more attention to the relationship between objects in the image,and use a logical language to describe them.In this paper,we first study the traditional image description algorithms,and briefly introduce several types of image description algorithms.The traditional image description algorithm only sends the feature extraction processing to the language generation module for text generation,but ignores the high-level semantics of the image itself.This paper proposes an image description generation algorithm that combines high-level image semantics.It uses the VGG network to train the single-label classification model on the ImageNet data set.Based on this,it uses the MS COCO data set to construct a dictionary and performs data on the MSCOCO CAPTION data set.Training label preprocessing,then modify the last layer of the model and use MS COCO CAPTION data for multi-label training,and then use BING algorithm to select candidate areas and then perform multi-label classification on the area,and use maximum pooling to suppress noise for better Effect.Secondly,the Attention mechanism is analyzed and studied.The traditional Attention only pays attention to the image feature map,but does not fully consider the previously generated words,and the image description may sometimes have little or no need to predict the next word from the image.Therefore,an improved Attention mechanism is proposed.By adding a weight variable,it automatically learns when Attention refers to words that have been generated,when the images are concerned,and how much attention is paid.Then a multi-modal layer is added behind the attention layer to multi-modally process the hidden state information,attention information,and high-level semantics of the image from the recurrent neural network.Finally,experiments were conducted using MS COCO and Flickr30 K datasets and compared with previous researchers' algorithms.Experiments show that the proposed image description method based on image high-level semantics and Attention can effectively improve the quality of image description.
Keywords/Search Tags:convolutional neural network, high-level image semantics, attention mechanism, image feature extraction, LSTM
PDF Full Text Request
Related items