Font Size: a A A

Research On Image Caption Generation Based On Deep Reinforcement Learning

Posted on:2021-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:H P TongFull Text:PDF
GTID:2428330614972015Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Image caption generation is the process of combining image recognition and natural language processing to generate a description sentence for the target image,which is of great significance in early childhood education,blind navigation and human-computer interaction.In recent years,the deep learning-based encoder-decoder framework model has become a hotspot in image caption generation research.This method uses a convolutional neural network encoder to extract image features,and uses the caption in the image-caption pair as a reference sentence in the decoder training process and vectorizes it.The resulting word vector and image features are input into the recurrent neural network decoder to generate the caption,in which the word vector input in the training phase is the truth value,and the test phase is the predicted value of the previous step.In the above method,because the test stage relies on the output of the previous step,once the error is output,it will cause the problem of error accumulation,and there are problems of insufficient image feature extraction and deviation of the word vector from the meaning of the reference sentence,thereby affecting the caption generation effect.Aiming at the problem of inadequate image feature extraction,error accumulation at the test stage,and the deviation of meaning between word vectors and reference sentences,this article is based on deep reinforcement learning,and introduces context encoding network and Bert model to study image caption generation methods.The specific work of this article is as follows:(1)Designed an image caption generation model DR-ECRNN model based on deep reinforcement learning embedded context encoding network.The DR-ECRNN model is based on the Basic DR-ECRNN basic model design.In order to extract more adequate image features,in the image encoding stage,a context encoding network is introduced on the basis of the deep residual network of the basic model,and the resulting image features are input into the LSTM for learning,using strategy network and value network with strong decision-making ability to guide them to generate description sentences.It can be seen from the experimental results that the DR-ECRNN model has an average increase of 0.8%,0.58%,and 0.6% on the BLEU score on the Microsoft COCO Caption 2014,Flickr8 k,and Flickr30 k datasets compared to the basic model without the introduction of the context encoding network.(2)Designed the DR-BCRNN model of image caption generation model based on deep reinforcement learning introducing Bert model.The encoder of the DR-BCRNN model is the same as the DR-ECRNN model.The decoder is improved based on the DR-ECRNN model.In order to obtain more accurate word vectors,the Bert method is used to vectorize the reference sentence,which increases the position of the word in the sentence.The representation information of the position and the relative position of the sentence to which the word belongs is combined with the context to generate vector representations for the reference sentence,inputting into the LSTM for learning,and use the strategy network to provide decision-making,and the value network calculates the reward value of the current decision.It can be seen from the experimental results that the BLEU score of the DR-BCRNN model on the Microsoft COCO Caption 2014,Flickr8 k and Flickr30 k datasets is improved by an average of 0.63%,0.60% and 0.33% compared to the DR-ECRNN model without the Bert method.Based on the above work,this paper also compares the DR-ECRNN model and DR-BCRNN model with the related models m-RNN,NIC and NIC + att in recent years.The results show that on the above three data sets,The BLEU score is better than other models,and it has improved in other evaluation indicators to varying degrees.
Keywords/Search Tags:Image Caption, Deep Reinforcement Learning, Context Encoding Network, Bert Embedding
PDF Full Text Request
Related items