Research On Online Reinforcement Learning Based On Sparse Representation

Posted on:2021-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:G Yang

Full Text:PDF

GTID:2518306557968179

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the increment of uncertainty in the real world,Online Reinforcement Learning(ORL)has aroused widespread concerns on fast learning and more attention on the improvement of data efficiency.However,ORL usually suffers from complex Value Function Approximation(VFA)and catastrophic interference,which make the deep neural network difficult to be applied to a true ORL algorithm.A good sparse representation can well alleviate interference by updating local parameters.Inspired by this,this paper designs some ORL algorithms around sparse representation learning.The main work and innovation are as follows:1.Aiming at the problem of sample utilization and computational complexity in deep reinforcement learning,the first contribution of this thesis is to reintroduce online mechanisms into deep reinforcement learning.First,in order to solve the catastrophic interference in online reinforcement learning,we propose an effective sparse representation,Continuous Piecewise Neural Network(CPNN).Secondly,on this basis,we use Random Hybrid Optimization(RHO)that can balance the semi-gradient and residual-gradient,to enhance the stability of the algorithm,and in the famous ??2? problem,the stability of the optimization in off-policy algorithm is verified.In addition,we propose Piecewise Deep Q-learning Network,and verify the effectiveness of this algorithm in multiple Atari games.2.In order to obtain an effective sparse representation,this second contribution of thesis is to review the advantages and disadvantages of existing sparse representations,and summarize the four competitive advantages of effective sparse representations:learnable,non-prior,non-truncated and explicit.Furthermore,we creatively introduce the attention mechanism into the kernel-based value function approximation,and propose a novel sparse representation,attentive kernel-based model,where the attention represents the sparse degree of features.Based on the framework of traditional Temporal Difference(TD)Learning,we propose the Online Attentive Kernel-Based Temporal Difference Learning(OAKTD).In this thesis,we derive the update equation of attentive kernel-based function according to the semi-gradient optimization method.Experimental results on two classic control problems verify the effectiveness of our proposed algorithm.3.Aiming at the problem that the semi-gradient optimization algorithm is difficult to converge and stabilize in attentive kernel-based approximation,the last contribution of this thesis is to adopt Two-Timescale Optimization(TTO),which is divided into two processes:slow learning sparse representation and fast learning function approximation.Similarly,we derive the two-timescale update formula of attentive kernel-based model.In addition,we give a theoretical proof of algorithm convergence according to ODE-based analysis.The experimental results in several classic control problems verify the effectiveness of our proposed algorithm.

Keywords/Search Tags:

Online Reinforcement Learning, Value Function Approximation, Catastrophic Interference, Sparse Representation, Attention Mechanism, Two-Timescale Optimization, Temporal Difference Learning

PDF Full Text Request

Related items

1	Research On Weight Update Method In Temporal Difference Algorithm
2	Research On Temporal Difference Algorithm Based On Kernel Function Approximation
3	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
4	Application Of Radial Basic Function Networks And Instance Based Learning In Reinforcement Learning
5	Research On Regularized Least Squares Policy Evaluation Algorithms In Reinforcement Learning
6	Reinforcement Learning-based Optimal Control Methods With Applications To Mobile Robots
7	Sparse Value Function Approximation for Reinforcement Learning
8	Study Of Reinforcement Learning Algorithms Based On Value Function Approximation
9	Decentralized Multi-agent Reinforcement Learning Algorithm Research
10	Theoretical Research On Multi-step Reinforcement Learning Algorithm