Research On Scene-based Image Semantic Description Generation Technology

Posted on:2021-02-14

Degree:Master

Type:Thesis

Country:China

Candidate:X Wang

Full Text:PDF

GTID:2428330611450330

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Image description generation technology belongs to the cross field of computer vision and natural language processing.It is a series of complex processing of a input image,and then outputs the corresponding natural language description of the image.Image description generation technology can connect image information with natural language description,so that the computer can understand the image information and generate the corresponding natural language description.Scene-based image semantic description generation technology consists of two parts: the encoder and decoder.The encoder is realized by the convolutional neural network.The convolutional neural network extracts the input image information,removes the unimportant features,retains the valuable features and sends them to the decoder for processing.The decoder composed of attention mechanism and circulating neural network.The attention mechanism,according to the extracted image features,the characteristics of the previous cyclic neuron hides the state,extracts the image features of weightings.According to the characteristics of the generated attention mechanism,the hidden state and the unit state,the circulation neural network generates the corresponding natural language description.The research contents of this paper are as follows:(1)In terms of the encoder,the problem of resnet-101 convolutional neural network is solved by using the Efficient Net convolutional neural network based on depth,width and resolution.The Efficient Net convolutional neural network not only performs very well in image feature processing,but also has a small number of parameters.Experimental results show that the performance of the encoder,using the Efficient Net convolutional neural network as the image description generation technique,is 1.65 percent better and the training time is slightly shorter than that of the image description generation technique based on the Res Net convolutional neural network.(2)In terms of the decoder,this paper discusses the problem of overfitting in the process of the realization of the attention mechanism,introduces the batch normalization technology to improve it,and studies the structure of the attention mechanism.In order to improve the performance of attention mechanism,an activation layer is introduced in the middle of two linear layers to improve the attention mechanism.To solve the problem of large number of parameters to be optimized and long training time based on LSTM circular neural network,the GRU circular neural network is used to replace LSTM circular neural network to process the characteristics from the encoder.This can reduce the number of parameters of the whole decoder network and accelerate the training of the network model.Experimental results show that using improved attentional mechanism and improved cyclic neural network as decoders not only improves the performance of generating natural language description,but also reduces the number of parameters of decoder network and thus reduces the training time.Finally,Improved image description generation model implements scene-based image description.

Keywords/Search Tags:

encoder, decoder, convolutional neural network, recurrent neural network, attention mechanism

PDF Full Text Request

Related items

1	Research On Image Description Method Based On Multimodal Recurrent Neural Networks
2	Research On Image Semantic Segmentation Based On Convolutional Neural Network
3	Research Of Text Sentiment Analysis Methods Based On Neural Network
4	Text Classification Research Based On Deep Neural Network And Attention Mechanism
5	Research On Speech Emotion Recognition Based On Convolutional Recurrent Neural Network
6	Design Of Mathematical Formula Recognition System Based On Convolutional Neural Network And Attention Mechanism
7	Visual Data Understanding Based On Deep Encoder-Decoder Framework
8	Research On Sentiment Analysis Algorithm Of Commodities Review Based On Convolutional Recurrent Neural Network
9	Research On Music Source Separation Algorithm Based On Deep Convolutional Neural Network And Its Application
10	The Cross-site Script Detection Based On Deep Learning