Font Size: a A A

Research On Continuous Robot Control Algorithms Based On Reinforcement Learning

Posted on:2021-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:L D LiFull Text:PDF
GTID:2518306020967259Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
With the development of the artificial intelligence technology,continuous robot control issue has attracted enormous attention because of its significant role in real-world applications.However,they are hard to be accurately modeled in practice.It is considerably difficult to find the control policy appropriately through traditional methods,and the designed control model is easily affected by noise,resulting in poor control effect.To this end,learning the control policy of robots based on data-driven methods has become a hot issue in present research.As promising approaches to find the optimal control policy of continuous robot control tasks,the policy gradient(PG)based model-free reinforcement learning(RL)methods are able to learn the control policy through the interaction with dynamic environment without modeling.Whereas,they suffer from long training time and slow convergence.To faster convergence and training,robot control algorithms based on reinforcement learning with continuous space are studied in this dissertation to efficiently learn the control policies.In this paper,a novel policy gradient algorithm with PSO-based parameter exploration(PG-PSOPE)is proposed,providing a simple but effective RL framework to learn the control policy of continuous robot control tasks.The main works are summarized in following three aspects.(1)In order to overcome the high gradient estimation variance caused by the action exploration of the PG based RL methods when they address the continuous robot control issues,we adopt the parameter exploration instead of the exploration on action space.In this way,the policy exploration and policy update are conducted simultaneously,completely replacing the sophisticated policy gradient estimation which is easy to fall into local optima.As a result,control policy can be updated without gradient computing to avoid the variance problem of the gradient estimation,so as to speed up the convergence.(2)To speed up the training of continuous robot control policy,we firstly transformed the optimization of control policy into the continuous optimization problem of control policy parameters based on the interaction of dynamic environment.Then we introduced particle swarm optimization algorithm(PSO)into RL to explore the parameters of control policy.Based on PSO theory,a group of intelligent RL agents with different initial policy parameters is constructed to interact with the environment.In this way,the exploration and the update of policy parameters are combined,avoiding the complicated gradient computing and back propagation to further speed up the training.Moreover,the additional probabilistic mutation operation effectively eases the dilemma of local optimal,so as to further accelerate the convergence and training process.(3)The comparing experiments of the proposed PG-PSOPE,including the classical inverted pendulum control tasks as well as multiple robot locomotion tasks with different complexity,are carried out based on the Open AI integrated simulation environments with MuJoCo physical engine.The experimental results verify the effectiveness of the proposed algorithm,which not only fasters the convergence but also shortens the training time,in which the training time is reduced by 1.92 to 58 times.
Keywords/Search Tags:Continuous Control, Model-free Algorithm, Reinforcement Learning, Policy Gradient, Parameter Exploration
PDF Full Text Request
Related items