Research On Control Of Robot Arm Based On PPO Algorithm

Posted on:2023-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:K Guo

Full Text:PDF

GTID:2568306833972369

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Because of its high flexibility and low energy consumption,robot arm is widely used in foundry industry,automobile industry and other industrial fields.The limitation of traditional methods in control performance and the requirements of its application scenarios such as universality,unpredictability and dynamics catalyze the research on intelligent control of manipulator.With the development of deep reinforcement learning,agents can still self-learn without artificially providing environmental data.However,existing algorithms have limitations such as instability of strategy update to varying degrees and only being applicable to discrete action space.Therefore,when applied to manipulator control,it is unstable in highdimensional complex environment.Therefore,in view of the above defects,this paper implemented the near end policy optimization algorithm(PPO)to the unknown environment manipulator trajectory control.First,since the rationality of the design of rewards and punishment function for model convergence effect,and robustness of trajectory control has the important influence,this article on the basis of artificial potential field method,integrated position reward incentive function,direction function,set for the DRL robot arm control weighted reward function space,robot arm in order to enhance the learning efficiency and stability.Secondly,the PPO deep reinforcement learning algorithm combined with LSTM is proposed and applied to the trajectory control of the manipulator.The environment image is used as the input to maximize the practicability of the algorithm.The dimension of the environment input is reduced through the automatic encoder,and LSTM is introduced into the environment perception state space to make it have the ability of effective prediction and avoiding ineffective learning.PPO algorithm is used for manipulator control.Based on ROS simulation platform,by comparing the PPO algorithm combined with LSTM with actor critical and PPO algorithms,the average reward is increased by 6.12% and 4% respectively,and the performance of learning efficiency,stability and other algorithms are improved.It is proved that the algorithm can better have high flexibility and robustness in complex environment and complete obstacle avoidance and grasping tasks more efficiently.The manipulator control task is completed by building a two-dimensional manipulator visualization platform based on gym and a three-dimensional visualization simulation platform based on ROS.In the two-dimensional environment,the static manipulator model is built by D-H modeling,and then the static and dynamic simulation environment script is written according to the static model to render the manipulator environment.In the three-dimensional environment,first build the static model of the manipulator in URDF format,add node data such as link,and then use moveit! The interface is programmed in combination with the manipulator motion control planning,and the animation is rendered to the visual environment of gazebo and rviz.Complete the obstacle avoidance and grasping control of the manipulator,and more intuitively verify the performance of the algorithm proposed in this paper.

Keywords/Search Tags:

reinforcement learning, proximal policy optimization, long short-term memory, manipulator motion planning

PDF Full Text Request

Related items

1	Research On Mapless Navigation Based On Reinforcement Learning
2	Research On The Motion Planning Of Industrial Manipulator Based On Reinforcement Learning
3	Motion Planning Of Six-legged Robot Utilizing Deep Reinforcement Learning
4	Research On The Fusion Of Different Memory Networks In Deep Reinforcement Learning
5	Research On Computer Network Routing Optimization Algorithm Based On Deep Reinforcement Learning
6	Research And Application Of The Short-term Memory Network For Adjusting Gate Length
7	Research On Agent Autonomous Navigation Technology Based On Reinforcement Learning
8	Robust Policy Gadient Algorithm Based On Actor-Critic In Deep Reinforcement Learning
9	Research On Optimization Methods Of Micro Mouse Based On Reinforcement Learning
10	Research On Visual Servo Control Of Manipulator