Font Size: a A A

Image Captioning Based On Automatic Constraint Loss

Posted on:2020-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:C Q XuFull Text:PDF
GTID:2428330620954835Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Image captioning is a fundamental problem in the field of computer vision,natural language processing and machine learning.The goal is to convert the image into a sentence that describes the relationship between the object.The transformation of complex image features into simple language descriptions has broad application prospects in the fields of image classification,image retrieval,and motion recognition.Recently,many methods employ an encoding-decoding framework in which the current target state and the target word of the previous step are used to predict the current target word during the training phase.In the prediction stage,since the target word at the previous step is not determined,and the word output by the previous step is used as the input of the current step,so the training and prediction phases are inconsistent.When the words generated at a certain step are not accurate enough,it may cause the caption to get worse.So this paper mainly studies the following contents:(1)Due to the inconsistency between the training and prediction process in the coding-decoding framework,this paper analyzes the impact of the problem on the RNN and the attention mechanism,and proposes a method named automatic constraint loss.This method is different from simply adding the loss of each moment in the cyclic neural network,but setting the weight for the loss of each moment,so that the loss weight of the current moment increases with the decrease of the word error rate at the previous moment,thus The difference between the training phase and the prediction phase is reduced.(2)In the automatic constraint loss,the loss weight of the word at the moment after the accuracy control of the word at the previous moment may ignore the influence of the synonym or synonym of the target word at the previous moment on the loss weight update.Therefore,when calculating the loss weight of each moment in the training phase,this paper not only considers the accuracy of the target word at the previous time,but also calculates the similarity between the probability distribution generated by the previous step and the target word,thereby further improving the effect of the automatic constraint loss method on image captioning(3)The effect of this method is tested on the MSCOCO dataset.The experimental results show that compared with the traditional maximum likelihood method,this method achieves better results and can make the attention mechanism more accurately select the image area.
Keywords/Search Tags:Image Captioning, Attention Mechanism, RNN, Automatic Constraint Loss
PDF Full Text Request
Related items