Research On Sparse Reward Based On Reinforcement Learning

Posted on:2021-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:J L Fang

Full Text:PDF

GTID:2428330602467135

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the explosive news of the artificial intelligence Go program developed by DeepMind a few years ago,and the subsequent Alpha Zero's great success in Japanese chess,and the amazing results that OpenAI later achieved on the e-sports game Dota2,I believe everyone is intensive learning Already have a certain understanding.Although the traditional reinforcement learning algorithm can converge well in some simple environments,the application range has been limited due to the inability to face complex environments and the inability to process direct sensory data from the environment.With the vigorous development of deep learning algorithms,people have seen the advantages of deep learning algorithms,so it has become a trend to integrate deep neural network technology into reinforcement learning algorithms,and the deep reinforcement learning algorithm after fusion has gradually become a reinforcement an important research direction in the field of study.However,in the practical problems faced by reinforcement learning,the problem of sparse rewards has always been one of the problems to be solved urgently.Even deep reinforcement learning algorithms cannot learn well in a sparse reward environment.People continue to explore,through artificial design rewards,course learning,curiosity mechanism,hierarchical reinforcement learning and other different methods to improve the model,hoping to better training in a sparse reward environment.But the effect is not very satisfactory,there are many limitations.This article is based on the experience playback technology in the DQN algorithm,by improving the rules of state storage and setting of goals in the experience pool,and through the parallel ideas provided by the A3 C algorithm,a parallelization framework is designed to enable deep reinforcement learning algorithms to be sparse better training in the environment.The specific algorithm design is to first preprocess the experimental environment to facilitate network training and reduce the amount of calculation and improve efficiency.Then the DDPG algorithm based on policy gradient is improved to optimize the experience playback technology.Then based on the principle of A3 C algorithm,a parallelization framework is designed,which improves the problem of strong correlation between training samples,thereby better improving the efficiency of network training.Finally,conduct experiments,comparative analysis,and verify the results.Through the improvement of the experience pool and the parallel processing,the deep reinforcement learning algorithm in this paper performs well in terms of both feasibility and stability.And in the ALE game platform environment experimented in this article and the self-made simple environment,the algorithm has a good performance,both in terms of training efficiency and final performance,it exceeds the DDPG algorithm and A3 C algorithm compared with it.And compared with traditional reinforcement learning algorithms,the effect is more significant.

Keywords/Search Tags:

Reinforcement Learning, Deep Reinforcement Learning, Sparse rewards, Experience pool

PDF Full Text Request

Related items

1	Research On Multi-goal-conditioned Method In Reinforcement Learning With Sparse Rewards
2	Research On Security Deep Reinforcement Learning Based On Experiences
3	Research On Improvement Method Of Experience Playback Mechanism In Deep Reinforcement Learning
4	Supervised Reinforcement Learning:methods And Applications
5	Research On The Sparse Reward Problem Based On Hierarchical Reinforcement Learning
6	Research On Motion Planning In Dynamic Environment Based On Deep Reinforcement Learning
7	Deep Reinforcement Learning With Experience Replay
8	Study Of Robot Arm Control Based On Deep Reinforcement Learning
9	Improvement And Application Of Deep Reinforcement Learning Based On Experience Replay Mechanism
10	Highly-efficient Robot Self-learning With Deep Reinforcement Learning