Research And Application Of Deep Reinforcenment Learning Algorithms Based On Reward Shaping

Posted on:2021-08-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Liu

Full Text:PDF

GTID:2518306338985829

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is a learning method in which the agent maximizes the cumulative reward through interacting with the environment in order to find the optimal strategy.Deep learning has achieved success in image recognition,natural language processing,autonomous driving.Deep reinforcement learning that combines powerful representation capabilities of deep learning with decision-making capabilities of reinforcement learning is gradually applied in the fields of game,autonomous driving,and recommendation systems.However,deep reinforcement learning still faces the problem of poor learning performance of the agent due to the obstruction of policy updates when the reward is sparse or delayed.Reward shaping is one of the main mechanisms to solve the above problems.It combines people’s prior knowledge to design more frequent rewards or training signals to guide strategy learning through a more effective way.Research in this area is still the focus of academia and industry.This article has carried out research on the reward shaping of deep reinforcement learning algorithms.The main work of this paper is as follows:This paper proposes a Phased Goal Reward shaping method(PGR)that utilize game images to express phased goals to achieve reward shaping.This method uses the image of state in game to represent the phased goal of agent,and uses the frame interval to measure the completion of the phased goal,thus designing the phased goal reward function to guide the agent’s strategy learning.we evaluate the learning performance based on the Kangaroo game in Atari environment.Compared with the proximal policy optimization algorithm,the proposed method achieves better performance in environment score.This paper proposes a reward shaping method(DEC)that introduces a prior knowledge on the exploration reward mechanism to guide its direction,and achieves a deep reinforcement learning algorithm(PGR-DEC)that combines the phased goal reward and constrained exploration reward.This method introduces knowledge about the classification of positive and negative game results,which is represented by game images of state.In the processing of exploration rewards,an exploration reward based on the attenuation of prior knowledge is designed to guide the agent in sparse reward environment.It maintains the exploration mechanism to obtain opportunities for unknown strategy learning,and improves the efficiency of random exploration.In the Kangaroo game,the proposed PGR-DEC algorithm and the intrinsic curiosity model algorithm(ICM)are evaluated.Experiments show that the proposed algorithm can achieve higher environmental scores.

Keywords/Search Tags:

Deep reinforcement learning, Reward shaping, Phased goal reward, Exploration reward

PDF Full Text Request

Related items

1	Towards Design Of Intrinsic Rewards For Sparse Reward Problem
2	Research On Reward Optimization In Reinforcement Learning
3	Researches On Efficient Exploration Driven By Reward Function
4	Research And Implementation Of Sparse Reward Algorithm Based On Reinforcement Learning For Virtual Shooting Scenes
5	Research And Application Of Reward Shaping Based Reinforcement Learning
6	Theory and application of reward shaping in reinforcement learning
7	Feature Extraction In Deep Reinforcement Learning And Countermeasures For Sparse Reward
8	Q-learning Potential Reward Online Learning Technology Inspired By Priori Knowledge
9	Research On Deep Reinforcement Learning Algorithm Based On The Combination Of Intrinsic Reward And Auxiliary Tasks
10	Research On Robotic Arm Tracking And Grabbing Control Based On Fusion Reward PPO Algorithm