Font Size: a A A

Stable Deep Reinforcement Learning

Posted on:2021-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:M HeFull Text:PDF
GTID:2428330623467880Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning is an organic combination of deep learning and reinforcement learning.It inherits their respective advantages,including the stronger ability to perceive things of deep learning and the stronger decision-making ability of reinforcement learning.However,it also retains some disadvantages of both.For example,many value-based deep reinforcement learning algorithms inherits the problem of overestimation and underestimation in the original reinforcement learning algorithms.Overestimate(underestimate)will produce positive(negative)bias and affect the stability of the algorithm,so the thesis aims at this problem.The following work was carried out:(1)Analyze and study the factors that affect the stability of reinforcement learning,and discuss from the two aspects of bias and variance.The bias includes positive bias,negative bias and delusional bias.The variance includes random variance and importance sampling variance.This article analyzes the causes of these factors and the impact on the stability of the algorithm,providing ideas for solving these problems.(2)In order to solve the problem of positive and negative bias,this paper innovatively proposes a solution called interleaved access method.The method includes three parts: First,a new estimator is proposed and named coupled estimators.The coupled estimators can balance the positive and negative bias generated by the maximum estimator and the double estimators to improve the accuracy of the estimation.The second is to propose an innovative design method for the coupling rate of the coupled estimators,so that the coupling rate can be adjusted adaptively according to the change of the sample,and the performance is improved.The third is to introduce an interleaved access method on the basis of the coupled estimators to further reduce the variance of the coupled estimator.This article refers to these three parts as the interleaved access method.(3)Apply the interleaved access method to various types of reinforcement learning algorithms and deep reinforcement learning algorithms according with local conditions,and proposed new algorithms of corresponding interleaved access version.The improved algorithms include Q-learning,Sarsa,and Expected Sarsa in one-step temporal difference algorithm.N-step Sarsa and Sarsa(?)in n-step temporal difference algorithm.And deep Q-learning in deep reinforcement learning algorithm.At the end of this paper,the performance of various algorithms and their double structure version and interleaved structure version are compared in different experimental environments,and a detailed analysis is carried out.The new algorithm of the interleaved access version proposed in this paper performs best.
Keywords/Search Tags:Deep Reinforcement Learning, Reinforcement Learning, Positive Bias, Negative Bias, Stability
PDF Full Text Request
Related items