Font Size: a A A

Research On Intelligent Path Planning Of Manipulator Based On Reinforcement Learning

Posted on:2022-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LiuFull Text:PDF
GTID:2518306524490844Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The traditional path planning method of manipulator usually needs to establish an accurate mathematical model,which can only be used in a fixed task environment and lacks the ability of generalization.In recent years,Deep Reinforcement Learning(DRL)has made a breakthrough in the field of robot game,and researchers begin to explore the feasibility of applying DRL to manipulator control.Although the research of DRL in the single-agent environment is becoming mature,there is still a large space for development in the multi-agent scenario.Compared with single-agent environment,the biggest instability of multi-agent environment is that the strategy changes of each agent in the environment,will have an impact on the environment,resulting in the difficulty of convergence of the training network.In view of the above background,this article takes the manipulator as the object,and carries out the research on the path planning of the manipulator for the grasping and placement of the manipulator in the industrial environment.The main research contents are as follows:Firstly,this paper analyzes the problems existing in the application of actor-critic algorithm(Soft Actor Critic)based on maximum entropy from the reward function setting and the structure of the experience playback pool,designed a compound reward strategy that combines formal reward and distributed reward,and at the same time,the structure of the experience playback pool was improved.The original random sampling was changed to a priority sampling method based on the maximum reward.The threedimensional mechanical model of path planning task of single manipulator was established in NX(Siemens simulation software),and the improved SAC algorithm was applied to the NX simulation model.The experimental results were compared and analyzed to verify the effectiveness of the improved algorithm.Secondly,the SAC algorithm is combined with the framework of Centralized Training-Decentralized Execution(CTDE),and the single-agent reinforcement learning algorithm SAC is extended to the multi-agent environment,which effectively solves the dynamic decision-making problem in the multi-agent environment.In this framework,the way to improve the stability of the Critic network is to input observations from other agents for training.Meantime,the Actor network adopts a distributed design,which is unique to each agent.When executing actions,the strategy network only needs to consider its own environmental state.Finally,in the framework of CTDE,the Actor network of agents can only obtain its own observed values,so there is no effective communication between agents.Therefore,in this paper,an information sharing mechanism is established among agents,and the multi-agent SAC(MASAC)algorithm is proposed.Each agent can use the principle of Gate Recurrent Unit(GRU)to write their observation information into the communication device,and at the same time,they can also decode and read the information.After the iteration,the agents continuously optimize the communication parameters to realize the information exchange between agents.Experiments show that MASAC method has a good performance in the path planning task of the dual manipulator.Compared with the single-agent reinforcement learning method SAC,the proposed MASAC method greatly improves the success rate of the path planning task when the two manipulators cooperate.
Keywords/Search Tags:Deep reinforcement learning, path planning, multi-agent cooperation, CTDE
PDF Full Text Request
Related items