Font Size: a A A

Research On Reinforcement Learning Based On Asynchronous Method

Posted on:2020-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhaoFull Text:PDF
GTID:2428330590952088Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Reinforcement Learning(RL)is an important machine learning method.RL is based on the principles of animal psychology,adopts the "attempt and failure" mechanism in human and animal learning,emphasizes learning from interaction with the environment and uses the evaluative feedback signal to optimize the decision.Asynchronous reinforcement learning(ARL)is a recently popular method of reinforcement learning.It adopts several parallel actor-learners to explore the environment,and each actor-learner online updates the global parameters.This method alleviates the problem of traditional reinforcement learning algorithms which is slow in convergence and easy to fall into local minimum.However,in the face of the problems of discrete state space,the existing asynchronous reinforcement learning algorithms have not been able to combine the model-based algorithms with the asynchronous method very well,which leads to the limited convergence precision.At the same time,the convergence rate still needs to be further improved.In addition,in the face of the problems of continuous state space,the method of combining neural network with reinforcement learning is usually used.Each agent pushes the gradient information to the global thread,and the global thread updates the parameters according to the information pushed by each agent.But the existing asynchronous reinforcement learning algorithms can not take account of the different information transmitted by different threads at each update,which leads to the limitation of convergence speed.At the same time,the existing asynchronous reinforcement learning algorithms,to some extent,alleviates the problem that the traditional reinforcement learning is easy to fall into the local minimum,but it can not solve the problem completely.This paper,starting with asynchronous reinforcement learning algorithms,combines various means to improve the asynchronous reinforcement learning algorithms,and promotes the convergence speed and convergence precision of the algorithm.The main contents of this thesis are as follows.1.Research on asynchronous model-based reinforcement learning algorithms.In order to make the agent make full use of the explored information of the asynchronous update,this paper introduces the model-based method into asynchronous reinforcement learning,and proposes the asynchronous Dyna-Q algorithm.Asynchronous Dyna-Q algorithm divides the agents into actors and learners.Actors explore the environment,update their own parameters at the same time,and store the explored experience,while learners update the global parameters and guide the explorations of explorers based on the experience explored by actors.At the same time,in order to improve the convergence speed and convergence precision of the algorithm,this paper improves the asynchronous Dyna-Q algorithm,and introduces the phased method into the asynchronous Dyna-Q algorithm,and proposes an asynchronous phased Dyna-Q algorithm,which is named APDyna-Q.APDyna-Q divides the learning process of agents into different phases,and makes them implement different learning strategies at different phases.In this way,agents can make full use of the explored information to update parameters.The experimental results show that the asynchronous Dyna-Q algorithm and APDyna-Q proposed in this paper are effective.Compared with the traditional reinforcement learning algorithms and existing asynchronous reinforcement learning algorithms,the convergence speed and convergence precision of the algorithm can be greatly improved.2.Research on asynchronous reinforcement learning algorithm based on improved framework.The existing asynchronous reinforcement learning algorithms can solve discrete space reinforcement learning problems.But they also have some problems.First of all,they do not make full use of the information of the global thread.In the existing discrete space asynchronous reinforcement learning algorithms,global thread is only used to update parameters,and the information of it is not fully exploited.Secondly,there is room for improvement in the communication between different threads.Different threads can exchange information to improve the convergence speed of the algorithm.Based on the above considerations,this paper proposes a general improved asynchronous reinforcement learning framework for discrete space problems.The improved framework can make the asynchronous reinforcement learning algorithms solve the discrete state space problems efficiently and improve the convergence performance.This paper combines the improved framework with four asynchronous reinforcement learning algorithms: asynchronous Q learning algorithm,asynchronous Sarsa algorithm,asynchronous Sarsa(?)algorithm and asynchronous phased Dyna-Q algorithm,and proposes four efficient asynchronous reinforcement learning algorithms.The effectiveness of the proposed algorithm is verified by experiments.3.Research on asynchronous reinforcement learning algorithm based on dynamic updating weights.In existing asynchronous deep reinforcement learning algorithms,when each thread pushes updates to the global thread,it adopts a uniform learning rate,and fails to take account of the different information transmitted by different threads at each update.When the update of the agent to global thread is more biased towards failure information,it has no obvious help to update the parameters of the learning system.Therefore,this paper introduces the dynamic weights to asynchronous deep reinforcement learning algorithms and proposes a new reinforcement learning algorithm named asynchronous advantage actor-critic with dynamic updating weights(DWA3C).DWA3 C takes full account of the learning state between different threads.According to the different contents pushed by agent to global threads,it can update their weights dynamically,so that the convergence efficiency and convergence performance of the algorithm can be significantly improved.The experimental results show that DWA3 C is effective.Compared with the traditional reinforcement learning algorithms and the existing asynchronous reinforcement learning algorithms,the convergence speed and convergence precision of the algorithm can be greatly improved.
Keywords/Search Tags:reinforcement learning, asynchronous method, multithreading, improved framework, dynamic weights
PDF Full Text Request
Related items