With the continuous development of robot technology,people’s requirements for robot intelligence are getting higher and higher,and the design structure of traditional robot control algorithms is complex and the mode is single,which is only suitable for simple and repeated application scenarios,and cannot cope well with complex and changeable unstructured scenarios,and it is difficult to highlight its intelligence.How to make robots avoid mechanical tasks,but enable them to independently explore and learn the optimal strategy to perform tasks to adapt to more complex and changeable scenarios has become the key to the development of intelligent robots.As a new generation of artificial intelligence technology,the deep reinforcement learning algorithm provides a new solution for improving the intelligence of robots.Through the deep reinforcement learning algorithm,the robot can realize the self-collection and self-training of data,no longer limited to the design route given by humans,but constantly optimize its own strategy in continuous training until it meets expectations,with certain intelligence and generalization performance.In this paper,the Proximal Policy Optimization(PPO)algorithm is applied to the tracking scene and grasping scene of vision robotic arm,and the success rate and efficiency of tracking and grasping are studied.Aiming at the sparse reward problem of PPO algorithm in the tracking application of robotic arm,a tracking fusion reward guidance mechanism is proposed to overcome the influence of sparse reward,and the designed tracking fusion reward function includes trajectory correction reward,core area acceleration guidance reward,abnormal termination penalty and step size adaptive reward.Aiming at the sparse reward problem of PPO algorithm in the application of robotic arm grasping,the grasping fusion reward function is designed to guide the manipulator to track to the area where the target object is located before grasping the operation,which overcomes the impact of sparse reward to a certain extent.Aiming at the realization of the tracking and grasping application of the vision manipulator,the simulation environment of the tracking and grasping application of the vision manipulator is built based on the Py Bullet simulation platform,the PPO algorithm is combined with the simulation environment,the Actor network and the Critic network of the PPO algorithm adopt the design mode of convolutional neural network structure to facilitate the acquisition of the environmental state information of the manipulator tracking and grasping,and finally the PPO vision manipulator tracking and grasping system is established respectively.Based on the established PPO vision robotic arm tracking and grasping system,the simulation experiments of robotic arm tracking and grasping were carried out respectively,and the results of the robotic arm tracking simulation experiment showed that compared with the sparse reward,the average tracking success rate of the robotic arm under the guidance of tracking fusion reward increased from about 92% to about95%,an increase of about 3 percentage points,and the average tracking step number was reduced from more than 15 to around 8,and the tracking efficiency was nearly doubled.The results of the robotic arm grasping simulation experiment show that compared with the sparse reward,the average grasping success rate of the robotic arm under the guidance of grasping fusion reward increases from about 83% to about 86%,and the average number of steps required for successful grasping decreases from about 56 to about 26.Therefore,the robotic arm guided by the fusion reward has better performance in tracking application and grasping application accuracy and grasping efficiency. |