Traditional control methods of robotic arm are based on the precise mathematical model of the task and lacks adaptability.The control effect will be greatly influenced or even failed when environment or task has changed.In recent years,Deep Reinforcement Learning(DRL)has achieved tremendous success in artificial intelligence game,robot control and other aspects,and consequently it has been introduced into the control of the robotic arm.In the field of robotic arm control,Deep Deterministic Policy Gradient(DDPG)is widely used among deep reinforcement learning algorithms.Same as other DRL algorithms,it is difficult to adjust parameter and reprogram algorithms.DDPG also has the disadvantages of poor stability and low learning efficiency.Therefore,this paper discusses the application of DDPG on robotic arm control and focuses on improving the learning efficiency of DDPG in the task of reaching and grasping.The two-dimensional simulation platform of two-degree-of-freedom robotic arm control is built.Joint angle as the state input in simulation platform,the DDPG algorithm,which controls robot arm to reach the target point,is studied in depth.Firstly,for the sparse reward problem,the size of the reaching target area is dynamically adjusted in the training process,increasing the probability of the robotic arm reaching the target point in the initial training stage.Compared with the original algorithm,convergence time of the algorithm is significantly shorter.Secondly,for the setting problem of reward function,a compound reward function is proposed to improve the training speed of the algorithm.The compound reward includes single-step reward,episode sparse reward and direction reward.Then,to solve the problem of low data-utilization efficiency in memory replay unit,an experience replay pool is added to set the priority of data.Through priority of data,the number of high-quality data is increased to update algorithm network weight and optimize parameters.Compared with original DDPG,the efficiency of algorithm.is improved.Finally,the DDPG algorithm which integrates above three improvement points,is compared with several common continuous control algorithms to verify the overall effect of the improved algorithm.On the three-dimensional seven-degree-of-freedom robotic arm simulation platform,with the image as the state input,grasp is studied based on DDPG algorithm.Firstly,in ROS gazebo environment,the image sensor is added and the simulation platform is built.Then,the accuracy of grabbing the object is compared under the circumstances where the single left camera,single right camera and dual camera images are as input respectively.The experimental results show that the dual camera solves the problem of target occlusion,and it works better than the other two cases.Then,the convergence speed of the algorithm is faster by designing the compound reward function.At the same time,the grasping control of the robot arm is more precise,and capture' success rate is higher.Finally,in order to save training time and computing resources,the control model of the training in the case of any grasping point within the specified range is transferred to the task with fixed grasping point.Simulation results verifies the effectiveness of the transfer algorithm.Aiming at the problem of end-to-end deep reinforcement learning algorithm needs large sampling samples and expensive computing resources,a non-end-to-end combination solution,which control robotic arm by reinforcement learning and extract feature information and recognize target by deep learning,is studied.Firstly,based on Robomaster's scene of beating energy switch and robot arm control,we design simulation experiment platform.Then,for the problem of multi-source digital recognition,an improved convolutional neural network is proposed.Finally,the digital identification information is used as input control signal of the reinforcement learning algorithm,and the DDPG algorithm is used to control the robot arm.The experimental results indicate that the combined control scheme has better effectiveness and stability. |