Font Size: a A A

Research On Online Reinforcement Learning Based On Sparse Representation

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:G YangFull Text:PDF
GTID:2518306557968179Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the increment of uncertainty in the real world,Online Reinforcement Learning(ORL)has aroused widespread concerns on fast learning and more attention on the improvement of data efficiency.However,ORL usually suffers from complex Value Function Approximation(VFA)and catastrophic interference,which make the deep neural network difficult to be applied to a true ORL algorithm.A good sparse representation can well alleviate interference by updating local parameters.Inspired by this,this paper designs some ORL algorithms around sparse representation learning.The main work and innovation are as follows:1.Aiming at the problem of sample utilization and computational complexity in deep reinforcement learning,the first contribution of this thesis is to reintroduce online mechanisms into deep reinforcement learning.First,in order to solve the catastrophic interference in online reinforcement learning,we propose an effective sparse representation,Continuous Piecewise Neural Network(CPNN).Secondly,on this basis,we use Random Hybrid Optimization(RHO)that can balance the semi-gradient and residual-gradient,to enhance the stability of the algorithm,and in the famous ??2? problem,the stability of the optimization in off-policy algorithm is verified.In addition,we propose Piecewise Deep Q-learning Network,and verify the effectiveness of this algorithm in multiple Atari games.2.In order to obtain an effective sparse representation,this second contribution of thesis is to review the advantages and disadvantages of existing sparse representations,and summarize the four competitive advantages of effective sparse representations:learnable,non-prior,non-truncated and explicit.Furthermore,we creatively introduce the attention mechanism into the kernel-based value function approximation,and propose a novel sparse representation,attentive kernel-based model,where the attention represents the sparse degree of features.Based on the framework of traditional Temporal Difference(TD)Learning,we propose the Online Attentive Kernel-Based Temporal Difference Learning(OAKTD).In this thesis,we derive the update equation of attentive kernel-based function according to the semi-gradient optimization method.Experimental results on two classic control problems verify the effectiveness of our proposed algorithm.3.Aiming at the problem that the semi-gradient optimization algorithm is difficult to converge and stabilize in attentive kernel-based approximation,the last contribution of this thesis is to adopt Two-Timescale Optimization(TTO),which is divided into two processes:slow learning sparse representation and fast learning function approximation.Similarly,we derive the two-timescale update formula of attentive kernel-based model.In addition,we give a theoretical proof of algorithm convergence according to ODE-based analysis.The experimental results in several classic control problems verify the effectiveness of our proposed algorithm.
Keywords/Search Tags:Online Reinforcement Learning, Value Function Approximation, Catastrophic Interference, Sparse Representation, Attention Mechanism, Two-Timescale Optimization, Temporal Difference Learning
PDF Full Text Request
Related items