Research On Image Caption Generation Method Based On Deep Learning

Posted on:2019-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:J F Wang

Full Text:PDF

GTID:2438330548958407

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

In recent years,with the advent of large-scale data sets,deep learning has achieved great success in many traditional computer vision tasks due to its excellent computing capabilities,especially in image recognition.However,the existing research is to divide the image into one or more discrete labels.It does not describe the relationship between objects in the image,nor does it describe what is happening in the image.To solve this problem,this paper uses the latest deep learning technology to design a model that can connect images with natural language,so as to achieve image subtitle generation.The model designed in this paper mainly includes two parts,one is the image feature extraction part and the other is the language modeling and generation part.The image feature extraction part uses a pre-trained convolutional neural network as a feature extractor,and the language modeling and generation part uses a LSTM network with a cyclic structure.On this basis,this paper designs two ways to connect these two parts and build it into a neural network that can use end-to-end training.(1)Connect the two parts in a fully connected way,that is,connect the full connection layer features from the convolutional neural network to the LSTM network.This method is easy to operate,less computational,and can also achieve basic image caption generation to a certain extent.Its disadvantage is that the global feature information of the image is only used during initialization,and the position information between image contents is ignored.(2)A new approach based on the attention mechanism is used,which is more complex and computationally intensive,but it can make full use of the image features at each location to produce better results.This method first extracts the twodimensional image features from the convolutional neural network convolution layer,then transforms the image feature vectors and word vectors into the same dimensional space through two full connect transforms,and then computes the transformation between the two vectors.Similarity is used as the size of the model's attention.Finally,the attentional image features and word vectors are used to generate words at the next moment.(3)The above two models are trained on the data set Flickr8 K and its Chinese version Flickr8 K CN,thereby realizing the subtitle generation in both Chinese and English languages.Experiments show that the model has good adaptability to different languages,and the model with attention mechanism is superior to the basic model without attention in various evaluation indicators.

Keywords/Search Tags:

deep learning, computer vision, image caption generation, attention model

PDF Full Text Request

Related items

1	Research On Image Caption Generation Method Based On Deep Learning
2	Research On Image Caption Generation Method Based On Deep Learning
3	Research On Image Caption Generation Model Based On Attention Mechanism
4	Image Caption Generation Based On Attention Mechanism
5	Research And Application Of Image Description Generation Algorithm Based On Deep Learning
6	Research On Image Caption Algorithm Based On Fusion Of Multi-attention Mechanism
7	Research On Video Caption Generation Depth Model Based On Video Temporal Attention Level Fusion Mechanism
8	Research On Image Caption Generation Based On Deep Learning
9	Research And Implementation On Natural Scene Image Caption Based On Deep Learning
10	Research On Image Description Generation Algorithm Based On Attention Mechanism