Font Size: a A A

Research And Application Of Deep Reinforcenment Learning Algorithms Based On Reward Shaping

Posted on:2021-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiuFull Text:PDF
GTID:2518306338985829Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning is a learning method in which the agent maximizes the cumulative reward through interacting with the environment in order to find the optimal strategy.Deep learning has achieved success in image recognition,natural language processing,autonomous driving.Deep reinforcement learning that combines powerful representation capabilities of deep learning with decision-making capabilities of reinforcement learning is gradually applied in the fields of game,autonomous driving,and recommendation systems.However,deep reinforcement learning still faces the problem of poor learning performance of the agent due to the obstruction of policy updates when the reward is sparse or delayed.Reward shaping is one of the main mechanisms to solve the above problems.It combines people’s prior knowledge to design more frequent rewards or training signals to guide strategy learning through a more effective way.Research in this area is still the focus of academia and industry.This article has carried out research on the reward shaping of deep reinforcement learning algorithms.The main work of this paper is as follows:This paper proposes a Phased Goal Reward shaping method(PGR)that utilize game images to express phased goals to achieve reward shaping.This method uses the image of state in game to represent the phased goal of agent,and uses the frame interval to measure the completion of the phased goal,thus designing the phased goal reward function to guide the agent’s strategy learning.we evaluate the learning performance based on the Kangaroo game in Atari environment.Compared with the proximal policy optimization algorithm,the proposed method achieves better performance in environment score.This paper proposes a reward shaping method(DEC)that introduces a prior knowledge on the exploration reward mechanism to guide its direction,and achieves a deep reinforcement learning algorithm(PGR-DEC)that combines the phased goal reward and constrained exploration reward.This method introduces knowledge about the classification of positive and negative game results,which is represented by game images of state.In the processing of exploration rewards,an exploration reward based on the attenuation of prior knowledge is designed to guide the agent in sparse reward environment.It maintains the exploration mechanism to obtain opportunities for unknown strategy learning,and improves the efficiency of random exploration.In the Kangaroo game,the proposed PGR-DEC algorithm and the intrinsic curiosity model algorithm(ICM)are evaluated.Experiments show that the proposed algorithm can achieve higher environmental scores.
Keywords/Search Tags:Deep reinforcement learning, Reward shaping, Phased goal reward, Exploration reward
PDF Full Text Request
Related items