Stable Deep Reinforcement Learning

Posted on:2021-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:M He

Full Text:PDF

GTID:2428330623467880

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Deep reinforcement learning is an organic combination of deep learning and reinforcement learning.It inherits their respective advantages,including the stronger ability to perceive things of deep learning and the stronger decision-making ability of reinforcement learning.However,it also retains some disadvantages of both.For example,many value-based deep reinforcement learning algorithms inherits the problem of overestimation and underestimation in the original reinforcement learning algorithms.Overestimate(underestimate)will produce positive(negative)bias and affect the stability of the algorithm,so the thesis aims at this problem.The following work was carried out:(1)Analyze and study the factors that affect the stability of reinforcement learning,and discuss from the two aspects of bias and variance.The bias includes positive bias,negative bias and delusional bias.The variance includes random variance and importance sampling variance.This article analyzes the causes of these factors and the impact on the stability of the algorithm,providing ideas for solving these problems.(2)In order to solve the problem of positive and negative bias,this paper innovatively proposes a solution called interleaved access method.The method includes three parts: First,a new estimator is proposed and named coupled estimators.The coupled estimators can balance the positive and negative bias generated by the maximum estimator and the double estimators to improve the accuracy of the estimation.The second is to propose an innovative design method for the coupling rate of the coupled estimators,so that the coupling rate can be adjusted adaptively according to the change of the sample,and the performance is improved.The third is to introduce an interleaved access method on the basis of the coupled estimators to further reduce the variance of the coupled estimator.This article refers to these three parts as the interleaved access method.(3)Apply the interleaved access method to various types of reinforcement learning algorithms and deep reinforcement learning algorithms according with local conditions,and proposed new algorithms of corresponding interleaved access version.The improved algorithms include Q-learning,Sarsa,and Expected Sarsa in one-step temporal difference algorithm.N-step Sarsa and Sarsa(?)in n-step temporal difference algorithm.And deep Q-learning in deep reinforcement learning algorithm.At the end of this paper,the performance of various algorithms and their double structure version and interleaved structure version are compared in different experimental environments,and a detailed analysis is carried out.The new algorithm of the interleaved access version proposed in this paper performs best.

Keywords/Search Tags:

Deep Reinforcement Learning, Reinforcement Learning, Positive Bias, Negative Bias, Stability

PDF Full Text Request

Related items

1	Research On Reinforcement Learning Methods Based On Bias-correction Of Value Function Estimation
2	Research On Maximization Bias Corrected Off-Policy Algorithms In Reinforcement Learning
3	Supervised Reinforcement Learning:methods And Applications
4	Research On Security Deep Reinforcement Learning Based On Experiences
5	Research On Reinforcement Learning Based Control Method Of Magnetic Navigation AGV
6	Research On Group Confrontation Strategies Based On Deep Reinforcement Learning
7	Research On The Recommendation Method Of Deep Reinforcement Learning With Negative Feedback
8	Research On Stock Trading Based On Deep Reinforcement Learning
9	Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space
10	Research And Implementation Of Stock Quantitative Trading Algorithm Based On Deep Reinforcement Learning