Generating Image Captions Based On Deep Reinforcement Learning

Posted on:2018-09-09

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Lin

Full Text:PDF

GTID:2428330623450711

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Generating image captions has been a research topic in both fields of computer vision and natural language processing.This task requires the model to identify objects in images,understand relationships between objects,and describe them with a natural language.Unlike discrete image semantic generating,image captions generating requires to describe the whole image with a natural language.The description should not only include all main object in the image,but also include the relationship between the objects.How to design better models and algorithms to simulate the ability of human cognition and learning is a long-term research subject of widespread concern.This paper presents a model of image captions generating based on deep reinforcement learning.The problems existing in the current method is that the language model by using the Maximum Likelihood estimation method in probability can lead to the inconsistency of the loss function and the evaluation measures in the training.Meanwhile,the model is prone to cumulative errors during the test.In this paper,we design a model based on policy gradient algorithm,which is able to calculate the new target function gradient according to the evaluation measures and update the network parameters.In this model,the monte carlo method is used to estimate the return value of the reinforcement learning,and then use the return baseline method to reduce the variance of the target function gradient and accelerate the convergence speed of the function.The model is able to actively learn the bottom feature of an image,and independently direct the generating of the image captions according to the evaluation measures.We present three new models that integrate image semantic features into the above image captioning framework,by training them in an end-to-end manner.To incorporate attributes,we construct variants of architectures by feeding image representations and attributes into RNNs in different ways to explore the mutual but also fuzzy relationship between them.The models combine the image semantic features and the bottom visual features in order to further improve the accuracy of image captions generating.In this paper,we show that by optimizing for evaluation measures such as BLEU,CIDEr,METEOR and ROUGE,we can develop a system that improve on the metrics and generating sentences which are relevant to the image,fluent in semantics and without grammatically errors..We further show that by also optimizing for the recently introduced SPICE metric,which measures semantic quality of captions,we can produce a system that significantly outperforms other methods as measured by human evaluation.

Keywords/Search Tags:

Image captions, Deep reinforcement learning, CNN, Policy Gradient Algorithm

PDF Full Text Request

Related items

1	Research On Off-policy Reinforcement Learning Algorithm
2	Deep Reinforcement Learning Based On Policy Gradient Optimization And Its Application In Agent Control
3	Research On Agent Decision-making And Control Based On Deep Reinforcement Learning
4	Study Of Robot Arm Control Based On Deep Reinforcement Learning
5	Deep Deterministic Policy Gradient Based On Entropy Regularization And Regular Update
6	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
7	Optimization On Deep Reinforcement Learning Based On Policy Gradient
8	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
9	Research On LTE Radio Resource Allocation Algorithm Based On Deep Reinforcement Learning
10	Research On Game Algorithm Of Imperfect Information 3D Video Game Based On Deep Reinforcement Learning