Research On Three Key Problems In Reinforcement Learning

Posted on:2022-10-27

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X G Gao

Full Text:PDF

GTID:1528306323974909

Subject:Intelligent Science and Technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning enables an agent to learn skills through its interaction with environment.The agent tries to learn how to map the situations of the environment to its actions so as to maximize the rewards from the environment.Deep learning provided a performance boost to reinforcement learning,but also caused problems.The thesis has three parts for three key problems in reinforcement learning algorithms:(1)the negative effects caused by approximation error;(2)highly sample-dependent;(3)the unstability of RL algorithms.The first part of the thesis includes Chapters 3 and 4.Chapter 3 presents a theoretical analysis of the convergence of actor-critic methods and concludes that a sufficient condition of convergence of the algorithms is difficult to meet when function approximation methods are used to approximate value function.That means the approximation error in the value function not only causes an overestimation phenomenon but also has a negative effect on the convergence of the algorithms.Chapter 4 proposes an effective method to mitigate the approximation error in the value function.Chapter 4 provides an upper boundary of the approximation error of Q function approximator and then concludes that the error can be lowered by keeping the similarity of every two consecutive policies during the training phase of the policy.Based on this conclusion,a new RL algorithm called error-controlled actor-critic(ECAC)is proposed.The results of ablation studies verify the correctness of the conclusion,and the results of comparative evaluation demonstrate that ECAC significantly outperforms other model-free RL algorithms.The second part of the thesis(Chapter 5)proposes a robust sample-guided training method to reduce the required amount of sampling.To increase the robustness of samples,during the process of obtaining demonstration from an expert,noises are injected into the actions provided by the expert.Besides,in contrast with the pretrain method,the samples guided method uses the robust samples to guide the whole training process instead of training for initialization of the parameter of policy.The experiment results show that the noised samples are more efficient and the samples guided training method outperforms the pre-train method.The third part of the thesis(Chapter 6)is concerned with making reinforcement learning algorithms more stable.RL algorithms use only one agent to explore the environment,it’s hard to guarantee the sample diversity which determines the quality of samples.Furthermore,RL algorithms are sensitive to hyper-parameters.It’s a reliable way to hybridize RL with evolutionary algorithms(EAs).Chapter 6 proposes a framework called competitive swarm reinforcement learning(CSRL)which is a hybrid of RL and EA to ensure the robustness of RL algorithms.The framework runs RL and EA in turns.Agents in the same swarm share samples and the difference among their ways of exploring ensure the sample diversity.During the RL training process,different policies are trained using different hyper-parameter so as to make the algorithms insensitive to hyper-parameters.The results of comparative evaluation demonstrate that CSRL significantly outperforms other similar frameworks.

Keywords/Search Tags:

Reinforcement Learning, overestimation phenonmenon, Error Controlled Actor-critic, Competitive Swarm Reinforcement Learning

PDF Full Text Request

Related items

1	Optimization Method For Reinforcement Learning Based On Overestimation Control And Exploration Enhancement
2	Researches On Improvement Of Fixed Temperature Soft Actor Critic Algorithm
3	Research On Deep Reinforcement Learning Algorithm Based On Dual-Agent Cooperation
4	Research On Offline Deep Reinforcement Learning Algorithm Based On Truncation Error
5	Exdloratory Action Correction Algorithm Based On Actor-Critic
6	Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space
7	The Research On Symbolic Regression Based On Reinforcement Learning
8	Research On Pap Operation Skill Learning Of Manipulator Based On Reinforcement Learning Algorithm
9	Research On Multi-agent System Decision Algorithm Based On Deep Reinforcement Learning
10	Option Learning Method Research With Double Actor-Critic Architecture