Font Size: a A A

Research On Deep Neural Networks Models For Image Captioning

Posted on:2018-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q P ChenFull Text:PDF
GTID:2348330533961565Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,Artificial Intelligence and Machine Intelligence have been widely used in Computer Vision,Speech Recognition,and Natural Language Processing,with a rapid improvement.The Image Captioning finishes the task of transferring Image to Nature Captioning by cooperating the works of Computer Vision and Natural Language Processing.It would not only be widely used in image retrieval but also help the disabled communicating on the Internet.Compared with the traditional method using on Image Captioning task,the model of the combination of deep Convolution Neural Networks(CNNs)and Recurrent Neural Networks(RNNs)could identify the new objects in the new images,and describe it more naturally.At present,one of the disadvantages of the classical model,which is Neural Image Caption(NIC)model,on Image Captioning is that the description of an image is not always accurate on the details information.This paper considers how to make up for the above shortcomings from two strategies of scientific research,which are method works and problem works,and make the caption more attention to the details of the image region.Considering the strategy of the method,the disadvantage could be solved by adding attention mechanism component to the NIC model,which is commonly used on image captioning task.It helps model focused more on the details of an image during the generation of the image's caption.While the strategy comes to the problems,batch normalizes the neural networks layers of Fully Convolutional Localization Networks(FCLN)model,based on the Dense Captioning task,which is previously proposed.It could accelerate the training speed on the model,and improve the accuracy of detection and caption of the image region.The main research work of this paper includes:(1)Analyzing the recent researches on Image Captioning task.Specifically pointing the situation of most of the works are designed based on Google's NIC model,whose improvements are focused on the CNNs encoder,RNNs language model,the input function of encoding,word embedding function and others.Meanwhile,considering Dense Captioning task,illustrating the situation of the works are based on FCLN model.(2)In order to solve the problem on NIC model,which is the lack of the image detail information on the caption,this paper proposes a way to use attention mechanism to improve the generation of the caption's sentences.First of all,the key words of the specific regions on the image are detected by Multiple Instance Learning(MIL).Then,the words' embedding vectors are inputted to the hidden layers of Long Short-Term Memory(LSTM)networks in NIC model,which could draw attention to the key words when language model generating caption's words.(3)For the classic model of Dense Captioning task,this paper designs a way to batch normalize each neural network layers.It could accelerate the converge of model training and improve the accuracy of the region detected and caption generated by the model.In order to adapt the structure of the neural network layers and do not interrupt the End-to-End training of the whole model,the Batch Normalization transformation algorithm needs to be adjusted on the CNN layers and LSTM hidden layers.(4)Separately training the proposed models using the Image Caption dataset and the Dense Caption dataset,and evaluating the quality of the results using relevant evaluation metric corresponding to the above models.In order to the convenience of comparing the improvements of models,two relevant classic models are trained from the beginning,and the results of the training are used in the comparing.The final results of the evaluation of the accuracy of image caption's sentences show that the two models of improved beat the originals as well as the converge speed up on the second model.
Keywords/Search Tags:Image Caption, Deep Neural Networks Model, Multiple Instance Learning, Attention Mechanism, Batch Normalization
PDF Full Text Request
Related items