Font Size: a A A

Image Description Algorithm Based On Hierarchical Reinforcement Learning

Posted on:2020-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2438330572979808Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The image description task is to generate a short sentence for the image to describe the characters and scenes in the image.The task is to realize the mapping of image space and linguistic space.By constructing a network that can perceive and understand the subtle context information in the image,the observed scene is connected with the real world and the simple and accurate image description is output.Although it is a very simple thing for humans,it has many challenges for machines.It not only requires a visual algorithm to understand the content in the image,but also needs a language model of natural language processing in order to translate the understanding of the image into the correct word.Although many algorithms for image description have recently been proposed,they have also obtained very good objective evaluation indicators.But they still have some problems.The resulting sentences are very different from the actual annotations.Some key information on the image is not well described.The main reason for this problem is that most algorithms use supervised learning to learn image description tasks,when evaluate the performance of model they use methods completely different from that of supervised network learning.In this regard,this paper proposes a novel model framework based on hierarchical reinforcement learning to solve this problem.Optimize the network directly using objective evaluation indicators.First,the model uses the Faster-RCNN target detector to encode the context information of the image into a high-dimensional vector.For the decoding process,this paper proposes a concept of'divide and conquer'.Learning text feature information in image information can be regarded as a reinforcement learning process.It consists of the following parts:(1)Run at a lower time resolution: Manager,the purpose is to generate a higher-level strategy to produce a semantically meaningful sub-goal strategy.(2)Run at high time resolution:Worker,accept sub-goal strategy information from the high-level Manager,and generate specific strategies(generate specific description sequences).The specific work process is that Manger issues a new sub-target strategy for the Worker,and the Worker completes each sub-goal by generating words in turn.In addition,the Manager network and the Worker network use attention mechanisms when entering features,allowing Manger to focus on longer-term dynamics while the worker's attention is reduced to target-driven local dynamics.The main innovations of the hierarchical reinforcement learning model are as follows:(1)Designed the entire network architecture based on hierarchical reinforcement learning model,including Manager network and Worker network,attention mechanism.The training of the whole model was completed by using the hierarchical reinforcement learning training method without introducing additional data labels,and the high evaluation indicators was obtained through a large number of experiments,which verified the validity of the model.(2)A target detector is used to encode complex information in the image.Compared with the existing method using the deep convolutional network,the advantage of the target detector is that the detector classifies and detects the image during the training phase,and can obtain the semantic information and position information of multiple targets in the image.(3)In this paper,a self-confrontation training algorithm is proposed for the problem that the strategy gradient algorithm is inefficient in the single sampling training process.It is difficult to accurately estimate the degree of the current action performed by the model by doing a small amount of sampling on the model under different environmental conditions.In the strategy-based optimization algorithm,it is usually necessary to introduce a baseline reward to measure the average goodness and the degree in this state,for the sake of accurately estimate the benefits of each sampling action in this state.Under the premise of small sample sampling and large state space,itis difficult to estimate baseline variables and the benefits of executing actions in different states.Self-confrontation training can accurately estimate the baseline estimates of different states in the sampling process by training the dual model,it can estimate the accurate strategy gradient with fewer sampling times.This paper proves that this method has the ability to obtain faster learning efficiency and higher evaluation index through a large number of experiments.
Keywords/Search Tags:image caption, hierarchical reinforcement learning, Encode-Decode, Faster-RCNN
PDF Full Text Request
Related items