Because of its high flexibility and low energy consumption,robot arm is widely used in foundry industry,automobile industry and other industrial fields.The limitation of traditional methods in control performance and the requirements of its application scenarios such as universality,unpredictability and dynamics catalyze the research on intelligent control of manipulator.With the development of deep reinforcement learning,agents can still self-learn without artificially providing environmental data.However,existing algorithms have limitations such as instability of strategy update to varying degrees and only being applicable to discrete action space.Therefore,when applied to manipulator control,it is unstable in highdimensional complex environment.Therefore,in view of the above defects,this paper implemented the near end policy optimization algorithm(PPO)to the unknown environment manipulator trajectory control.First,since the rationality of the design of rewards and punishment function for model convergence effect,and robustness of trajectory control has the important influence,this article on the basis of artificial potential field method,integrated position reward incentive function,direction function,set for the DRL robot arm control weighted reward function space,robot arm in order to enhance the learning efficiency and stability.Secondly,the PPO deep reinforcement learning algorithm combined with LSTM is proposed and applied to the trajectory control of the manipulator.The environment image is used as the input to maximize the practicability of the algorithm.The dimension of the environment input is reduced through the automatic encoder,and LSTM is introduced into the environment perception state space to make it have the ability of effective prediction and avoiding ineffective learning.PPO algorithm is used for manipulator control.Based on ROS simulation platform,by comparing the PPO algorithm combined with LSTM with actor critical and PPO algorithms,the average reward is increased by 6.12% and 4% respectively,and the performance of learning efficiency,stability and other algorithms are improved.It is proved that the algorithm can better have high flexibility and robustness in complex environment and complete obstacle avoidance and grasping tasks more efficiently.The manipulator control task is completed by building a two-dimensional manipulator visualization platform based on gym and a three-dimensional visualization simulation platform based on ROS.In the two-dimensional environment,the static manipulator model is built by D-H modeling,and then the static and dynamic simulation environment script is written according to the static model to render the manipulator environment.In the three-dimensional environment,first build the static model of the manipulator in URDF format,add node data such as link,and then use moveit! The interface is programmed in combination with the manipulator motion control planning,and the animation is rendered to the visual environment of gazebo and rviz.Complete the obstacle avoidance and grasping control of the manipulator,and more intuitively verify the performance of the algorithm proposed in this paper. |