Font Size: a A A

Research On Fast Training Method Of Robotic Arm Based On Deep Reinforcement Learning

Posted on:2023-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:M WangFull Text:PDF
GTID:2568306806492404Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of technology,robots are more and more widely used in all walks of life.As a major branch of robotics,manipulators are also increasingly appearing in scientific,medical,industrial and other scenes,playing an important role.Among them,grasping function as the main application requirements of manipulator,has gradually become the research hotspot of manipulator control.In the face of increasingly complex tasks,traditional control methods have been difficult to meet the application requirements of manipulators.Deep Reinforcement Learning(DRL)algorithm,based on reinforcement learning theory and combined with deep learning theory,is one of the important research fields of machine learning.DRL aims to establish a general model,which realizes autonomous learning and autonomous decision-making through interaction with the environment,and can effectively solve the intelligent control problem of the manipulator.Therefore,in recent years,researchers have gradually combined DRL with the manipulator to carry out research and application in related fields.However,due to the real manipulator working in the three-dimensional reality environment,if the existing DRL algorithm trains the manipulator in three-dimensional space,it will face too large state space and action solution space,resulting in too long training cycle,excessive consumption of computer resources and high training cost,which is not conducive to the application and promotion of DRL theory in reality.Therefore,based on the analysis of the structure and motion mode of four different types of manipulators,and the analysis of using DRL algorithm to do manipulator training,the training scheme and learning algorithm of the manipulator are studied in depth.A solution space oriented dimension reduction training method for deep reinforcement learning of manipulator is proposed.The Deep Deterministic Policy Gradient(DDPG)algorithm is improved,and the Delay Update Policy Deep Deterministic Policy Gradient(DUP-DDPG)algorithm is proposed.This algorithm further improves the training efficiency of the manipulator,and verifies the effectiveness of the proposed dimension reduction training method for obstacle avoidance of the manipulator.The specific work is summarized as follows:(1)A study on dimensionality reduction training method of manipulator deep reinforcement learning for solution space.Through the decomposition of the grasping task,the training of the manipulator lateral steering gear and the longitudinal steering gear is decoupled.According to the two different rotation modes of the steering gear,the grasping task is divided into two steps: determining the grasping direction and the end effector approaching the target along the grasping direction.A grasping longitudinal plane is determined by the grasping direction,and then the joint of the longitudinal plane is trained to control the target object.In order to verify the effectiveness of the dimensionality reduction training method,a two-dimensional simulation environment was built.Three reinforcement learning algorithms are used to analyze and verify the four reduced-dimensional manipulators,and the network convergence effects of different algorithms are compared.At the same time,the simulation model was built in Coppelia Sim simulation environment and communication with the algorithm was established.The four manipulators were trained in a 3D simulation environment using three reinforcement learning algorithms,and the convergence effect was compared with that of the reduced-dimensional manipulators.The training process is simplified by greatly compressing the solution space through dimensionality reduction,while ensuring the accuracy of action execution.Further attempts were made to compress the target point of training from a two-dimensional plane to a straight line,and the differences were compared and analyzed with those when training in a two-dimensional plane.(2)Research and improvement of DDPG algorithm.The DUP-DDPG algorithm is proposed to improve the problem of overestimation of Q values by DDPG.The algorithm performs quadratic value estimation on the same batch of samples to delay the update of the policy network.In order to learn more valuable Experience,priorities Experience Replay(PER)algorithm is added to the algorithm to effectively improve the training efficiency of the algorithm.After the training,the algorithm was migrated to the real manipulator,and the "Hi Arm" manipulator was used to grasp four different sizes and shapes of target objects in the physical experiment verification and analysis to compare the convergence degree of the original algorithm and the improved algorithm and the impact on the grasping accuracy of the manipulator.The experimental results show that the proposed method has the features of low training complexity,high speed,high accuracy and low cost,and the grasping success rate can reach 98%.At the same time,the settings of the state vector,reward function and learning rate in the DDPG algorithm are analyzed for the problems handled,the effects of different noises on action exploration are discussed,and the effects of different sizes of target points and different regions on the training process are analyzed.(3)Research on descending training method for manipulator obstacle avoidance.The effect of the proposed dimensionality reduction method on the manipulator obstacle avoidance is analyzed.According to the three-dimensional shape,position and size of obstacles,obstacles in three-dimensional environment are mapped to the grasping plane of two-dimensional training environment.The DDPG algorithm is used to train the manipulator and obstacles after dimension reduction,and then the output angle of the training is input into the three-dimensional simulation environment.The convergence of the algorithm before and after adding obstacles in two-dimensional environment is analyzed,and the real manipulator is used to grab the target object in the case of obstacles.
Keywords/Search Tags:Deep reinforcement learning, Manipulator, Deep Deterministic Policy Gradient, Target grasping, Dimensionality reduction
PDF Full Text Request
Related items