Font Size: a A A

Research And Implementation Of Agent Continuous Control Technology Based On Distributed Reinforcement Learning

Posted on:2022-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:T Q XuFull Text:PDF
GTID:2518306566492214Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning,with its decision-making ability and deep learning perception,realizes the end-to-end learning mode from input to output and has a natural advantage in solving complex unmanned equipment control problems.At present,more and more unmanned equipment control solutions have been changing from traditional control methods to deep reinforcement learning methods,but using deep reinforcement learning to solve continuous control tasks is facing the problem of "dimension disaster".The continuous control tasks such as manipulator,simulation robot,UAV,and other intelligent unmanned equipment have complex motion control problems.At the same time,the friction force,the rotational friction of joint twist,and the voltage change of the UAV wing lead to a long time of reinforcement learning and training,and it is difficult to converge to a robust model.With the development of demand,distributed reinforcement learning methods have attracted more and more attention from researchers.More and more parallel methods and high-performance computing framework are used to solve the problem of long training time in deep reinforcement learning.Therefore,this paper studies the problem of continuous control tasks and proposes a Learning-Actor asynchronous method based on importance-sampling and a population-based evolutionary policy search method.On this basis,an extensible prototype system of multilearner and multi-actor reinforcement learning training framework is designed and implemented.Under the simulation environment,the system is designed and implemented,the results of this paper are verified by experiments.The contribution of this paper includes the following three points:(1)In view of the problem of long training time of continuous control task,a distributed method of Learner-Actor based on importance sampling is proposed.Based on importance sampling and V-trace method,the state is processed,and the action distribution is simulated by state feature encoding.The exploration space of the agent is increased by random sampling of distribution,and the environment state is replayed and transmitted.The method of input increases the stability of the output of the action,and then realizes the asynchronous sampling training method in the continuous action space.(2)In view of the difficulty of policy search,a population-based evolutionary strategy policy search method is proposed.The core idea of this method is: through parallel execution of multiple agents for training,regularly selecting the optimal agent,combining with other agents to generate new individuals,forming the next generation of population,adopting the measures of selecting and evolving,promoting the training to the optimal direction.The algorithm improves the performance of off-policy algorithm by evolutionary strategy search based on group guidance,which makes it adaptable and extensible in complex continuous control tasks.(3)Based on the above results,a prototype system with multi-learner and multi-actor reinforcement learning training framework is designed and implemented.The performance test of the algorithm is carried out through the prototype system.The experimental verification is carried out in the control of a more complex four-rotor simulation UAV environment and the sparse reward robot simulation environment closer to the real world.The experimental results show that compared with the traditional reinforcement learning algorithm,the proposed algorithm has obvious improvement in performance and robustness,and has good scalability.
Keywords/Search Tags:Continuous Control Task, Distributed Reinforcement Learning, Importance Sampling, Parallel Training, Policy Search
PDF Full Text Request
Related items