Font Size: a A A

Reinforcement Learning Algorithm Study Based On ESN

Posted on:2022-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2518306488492504Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,the artificial intelligence field has developed by leaps and bounds.During this period,new technologies and new methods have emerged one after another.Among them,deep reinforcement learning,which takes advantage of neural network perception,has attracted more attention.In this article,we choose echo state network(ESN)as the research object,and conduct research on classic reinforcement algorithms based on ESN.This article mainly does the following aspects of work.First,optimize the ESN online learning algorithm which optimized by the traditional recursive least squares(RLS),and propose a new mini-batch based MRLS-ESN optimization algorithm.Then,combining the MRLS-ESN algorithm with the traditional policy control algorithms,that is Q-learning and Sarsa respectively,two new policy control algorithms ESNRLS-Q and ESNRLS-Sarsa are proposed.Finally,it briefly discusses the application of the RLS-ESN optimization algorithm in the Advantage actor-critic(A2C)algorithm.ESN are generally optimized by RLS.Although RLS has fast convergence,it only uses one sample per iteration,which makes ESN difficult to scale to large datasets.To tackle this problem,an ESN model for mini-batch sequences is presented,and two optimization algorithms of stochastic gradient descent and Adam are given.Then,a novel mini-batch RLS algorithm is proposed for improving the training efficiency of the ESN model.On this basis,to avoid overfitting during the training of ESN,an regularization method is suggested for the proposed algorithm.In addition,to make ESN more suitable for time-varying tasks,an adaptive method for the forgetting factor of the proposed algorithm is also introduced.Simulation results show that the proposed algorithm has faster processing speed and better convergence quality than the original RLS algorithm.ESN have advantages of simplicity,easy to use and high training efficiency.However,limited by the strong correlation among states of agent,ESN-based policy control algorithms are difficult to update the network parameters by RLS.To solve the problem,two new policy control algorithms,ESNRLS-Q and ESNRLS-Sarsa,are proposed.Firstly,the leaky integrator ESN and mini-batch method are used to train in order to reduce the correlation among training samples.Secondly,the RLS self-correlation matrix is updated by an average approximation method to suit for processing mini-batch sequences.Thirdly,the regularization method is applied to preventing overfitting.Besides,the Mellowmax method is adapted to calculate the target state-action values to improve the convergence performance of the algorithms.Theoretical analysis and simulation experiments show that the proposed algorithms not only have lower computational complexity,but also have better convergence performance.In the A2C algorithm,the optimization of the critic network parameters is very important.Aiming at the optimization problem of the critic network,we propose an A2C algorithm based on RLS-ESN.Firstly,use ESN to provide more useful information for the critic network training.Secondly,use the RLS algorithm to optimize the relevant parameters to accelerating the algorithm convergence.Finally,in the comparison with the traditional optimization algorithm based on gradient,it is verified the proposed algorithm was effective.
Keywords/Search Tags:echo state networks, deep reinforcement learning, recursive least squares, mini-batch training, policy control algorithm, A2C Algorithm
PDF Full Text Request
Related items