Font Size: a A A

Research On Sparse Reward Based On Reinforcement Learning

Posted on:2021-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:J L FangFull Text:PDF
GTID:2428330602467135Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the explosive news of the artificial intelligence Go program developed by DeepMind a few years ago,and the subsequent Alpha Zero's great success in Japanese chess,and the amazing results that OpenAI later achieved on the e-sports game Dota2,I believe everyone is intensive learning Already have a certain understanding.Although the traditional reinforcement learning algorithm can converge well in some simple environments,the application range has been limited due to the inability to face complex environments and the inability to process direct sensory data from the environment.With the vigorous development of deep learning algorithms,people have seen the advantages of deep learning algorithms,so it has become a trend to integrate deep neural network technology into reinforcement learning algorithms,and the deep reinforcement learning algorithm after fusion has gradually become a reinforcement an important research direction in the field of study.However,in the practical problems faced by reinforcement learning,the problem of sparse rewards has always been one of the problems to be solved urgently.Even deep reinforcement learning algorithms cannot learn well in a sparse reward environment.People continue to explore,through artificial design rewards,course learning,curiosity mechanism,hierarchical reinforcement learning and other different methods to improve the model,hoping to better training in a sparse reward environment.But the effect is not very satisfactory,there are many limitations.This article is based on the experience playback technology in the DQN algorithm,by improving the rules of state storage and setting of goals in the experience pool,and through the parallel ideas provided by the A3 C algorithm,a parallelization framework is designed to enable deep reinforcement learning algorithms to be sparse better training in the environment.The specific algorithm design is to first preprocess the experimental environment to facilitate network training and reduce the amount of calculation and improve efficiency.Then the DDPG algorithm based on policy gradient is improved to optimize the experience playback technology.Then based on the principle of A3 C algorithm,a parallelization framework is designed,which improves the problem of strong correlation between training samples,thereby better improving the efficiency of network training.Finally,conduct experiments,comparative analysis,and verify the results.Through the improvement of the experience pool and the parallel processing,the deep reinforcement learning algorithm in this paper performs well in terms of both feasibility and stability.And in the ALE game platform environment experimented in this article and the self-made simple environment,the algorithm has a good performance,both in terms of training efficiency and final performance,it exceeds the DDPG algorithm and A3 C algorithm compared with it.And compared with traditional reinforcement learning algorithms,the effect is more significant.
Keywords/Search Tags:Reinforcement Learning, Deep Reinforcement Learning, Sparse rewards, Experience pool
PDF Full Text Request
Related items