Font Size: a A A

Design And Implementation Of Image Captioning Model Based On Deep Learning

Posted on:2019-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2428330566497304Subject:Software engineering
Abstract/Summary:
Image Captioning is a hot research issue in the area of Deep Learning that connects computer vision and natural language processing.Nowadays,The main focus of Image Captioning models is how to design a more effective visual attention mechanism,so that the model can extract and use the image features better in the process of generating the captions of the image.But most traditional approaches tend to adopt regular the language structure patterns.That is to say,they tend to fall into a stereotype of replicating frequent words or phrases in dataset,and can not make the model generate more rich and more varied image captions based on some unique characteristics of the image.This paper holds that the main reason of the issues discussed above is that the traditional models generally use the LSTM to generate the description of the image,which leads to the failure of the model to learn and use the syntactic features of the natural sentences.Therefore,this paper presents an Image Captioning model based on self-attention mechanism and spatial attention mechanism,which adopt popular Encoder-Decoder framework in design.In the Encoder module,Convolution Neural Network is applied to extract image features,and the Decoder of model consists of a number of sub modules instead of traditional LSTM,which are stacked by the multi-head spatial attention sublayer,the multi-head self-attention sublayer and the full-connection feedforward network sublayer.Among them,The multi-head spatial attention sublayer uses the spatial attention mechanism to select and utilize the image features and the multi-head self-attention sublayer based on self-attention mechanism can capture the syntactic or grammatical features in the natural sentence.After proposing and designing a new Image Captioning model,This paper introduces the specific implementation of each module of the model.In addition,this paper also introduces the test and evaluation of the model on the MSCOCO datasets,and the result shows its superior performance than the model only based on various os visual attention mechanisms.
Keywords/Search Tags:image caption, convolutional neural network, spatial attention, self-attention
Related items