Research On Image Caption Method Based On High-level Image Semantic And Attention

Posted on:2019-06-20

Degree:Master

Type:Thesis

Country:China

Candidate:C Fang

Full Text:PDF

GTID:2428330545954781

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent decades,computer technology has developed at a high speed,machine learning and other technical theories have also been continuously improved.Neural network technology has achieved remarkable research results in various fields.The description of the automatic generation of an image by a computer is a hot and difficult point in the current computer research field.Its essence is the use of computers to detect and identify objects in the image,and can perceive the scene of the image and the content of the scene.Compared to image detection and object classification in images,this task involves two major areas of computer vision and natural language processing.When the computer performs image description,it should not only pay attention to the individual objects and categories in the image but also pay more attention to the relationship between objects in the image,and use a logical language to describe them.In this paper,we first study the traditional image description algorithms,and briefly introduce several types of image description algorithms.The traditional image description algorithm only sends the feature extraction processing to the language generation module for text generation,but ignores the high-level semantics of the image itself.This paper proposes an image description generation algorithm that combines high-level image semantics.It uses the VGG network to train the single-label classification model on the ImageNet data set.Based on this,it uses the MS COCO data set to construct a dictionary and performs data on the MSCOCO CAPTION data set.Training label preprocessing,then modify the last layer of the model and use MS COCO CAPTION data for multi-label training,and then use BING algorithm to select candidate areas and then perform multi-label classification on the area,and use maximum pooling to suppress noise for better Effect.Secondly,the Attention mechanism is analyzed and studied.The traditional Attention only pays attention to the image feature map,but does not fully consider the previously generated words,and the image description may sometimes have little or no need to predict the next word from the image.Therefore,an improved Attention mechanism is proposed.By adding a weight variable,it automatically learns when Attention refers to words that have been generated,when the images are concerned,and how much attention is paid.Then a multi-modal layer is added behind the attention layer to multi-modally process the hidden state information,attention information,and high-level semantics of the image from the recurrent neural network.Finally,experiments were conducted using MS COCO and Flickr30 K datasets and compared with previous researchers' algorithms.Experiments show that the proposed image description method based on image high-level semantics and Attention can effectively improve the quality of image description.

Keywords/Search Tags:

convolutional neural network, high-level image semantics, attention mechanism, image feature extraction, LSTM

PDF Full Text Request

Related items

1	Research On Image Caption Method Based On High Level Semantic Extraction And Attention Mechanism
2	Research On Image Captioning Models Based On High-Level Semantics
3	Design And Implementation Of Video Semantic Analysis System Based On CNN And LSTM
4	Research On Image Caption Algorithm Based On Attention Mechanism
5	Research On Low-Light Image Algorithm Based On Attention Mechanism
6	Research On Image Content Understanding And Visual Reasoning Algorithm Based On Attention Mechanism
7	Study On Event Image Classification By Fusing Multiple CNNs Based On LSTM
8	Research And Implementation Of Chinese Image Natural Sentence Generation Technology
9	Research On Fine-grained Image Classification Based On Deep Convolutional Neural Network And Dual-domain Attention Mechanism
10	Research On Semantic-Attentive Deep Image Captioning Method