Font Size: a A A

Research On Scene-based Image Semantic Description Generation Technology

Posted on:2021-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2428330611450330Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Image description generation technology belongs to the cross field of computer vision and natural language processing.It is a series of complex processing of a input image,and then outputs the corresponding natural language description of the image.Image description generation technology can connect image information with natural language description,so that the computer can understand the image information and generate the corresponding natural language description.Scene-based image semantic description generation technology consists of two parts: the encoder and decoder.The encoder is realized by the convolutional neural network.The convolutional neural network extracts the input image information,removes the unimportant features,retains the valuable features and sends them to the decoder for processing.The decoder composed of attention mechanism and circulating neural network.The attention mechanism,according to the extracted image features,the characteristics of the previous cyclic neuron hides the state,extracts the image features of weightings.According to the characteristics of the generated attention mechanism,the hidden state and the unit state,the circulation neural network generates the corresponding natural language description.The research contents of this paper are as follows:(1)In terms of the encoder,the problem of resnet-101 convolutional neural network is solved by using the Efficient Net convolutional neural network based on depth,width and resolution.The Efficient Net convolutional neural network not only performs very well in image feature processing,but also has a small number of parameters.Experimental results show that the performance of the encoder,using the Efficient Net convolutional neural network as the image description generation technique,is 1.65 percent better and the training time is slightly shorter than that of the image description generation technique based on the Res Net convolutional neural network.(2)In terms of the decoder,this paper discusses the problem of overfitting in the process of the realization of the attention mechanism,introduces the batch normalization technology to improve it,and studies the structure of the attention mechanism.In order to improve the performance of attention mechanism,an activation layer is introduced in the middle of two linear layers to improve the attention mechanism.To solve the problem of large number of parameters to be optimized and long training time based on LSTM circular neural network,the GRU circular neural network is used to replace LSTM circular neural network to process the characteristics from the encoder.This can reduce the number of parameters of the whole decoder network and accelerate the training of the network model.Experimental results show that using improved attentional mechanism and improved cyclic neural network as decoders not only improves the performance of generating natural language description,but also reduces the number of parameters of decoder network and thus reduces the training time.Finally,Improved image description generation model implements scene-based image description.
Keywords/Search Tags:encoder, decoder, convolutional neural network, recurrent neural network, attention mechanism
PDF Full Text Request
Related items