Font Size: a A A

Generating Image Captions Based On Deep Reinforcement Learning

Posted on:2018-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LinFull Text:PDF
GTID:2428330623450711Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Generating image captions has been a research topic in both fields of computer vision and natural language processing.This task requires the model to identify objects in images,understand relationships between objects,and describe them with a natural language.Unlike discrete image semantic generating,image captions generating requires to describe the whole image with a natural language.The description should not only include all main object in the image,but also include the relationship between the objects.How to design better models and algorithms to simulate the ability of human cognition and learning is a long-term research subject of widespread concern.This paper presents a model of image captions generating based on deep reinforcement learning.The problems existing in the current method is that the language model by using the Maximum Likelihood estimation method in probability can lead to the inconsistency of the loss function and the evaluation measures in the training.Meanwhile,the model is prone to cumulative errors during the test.In this paper,we design a model based on policy gradient algorithm,which is able to calculate the new target function gradient according to the evaluation measures and update the network parameters.In this model,the monte carlo method is used to estimate the return value of the reinforcement learning,and then use the return baseline method to reduce the variance of the target function gradient and accelerate the convergence speed of the function.The model is able to actively learn the bottom feature of an image,and independently direct the generating of the image captions according to the evaluation measures.We present three new models that integrate image semantic features into the above image captioning framework,by training them in an end-to-end manner.To incorporate attributes,we construct variants of architectures by feeding image representations and attributes into RNNs in different ways to explore the mutual but also fuzzy relationship between them.The models combine the image semantic features and the bottom visual features in order to further improve the accuracy of image captions generating.In this paper,we show that by optimizing for evaluation measures such as BLEU,CIDEr,METEOR and ROUGE,we can develop a system that improve on the metrics and generating sentences which are relevant to the image,fluent in semantics and without grammatically errors..We further show that by also optimizing for the recently introduced SPICE metric,which measures semantic quality of captions,we can produce a system that significantly outperforms other methods as measured by human evaluation.
Keywords/Search Tags:Image captions, Deep reinforcement learning, CNN, Policy Gradient Algorithm
PDF Full Text Request
Related items