Path Planning Of Manipulator Based On Improved Reinforcement Learning

Posted on:2024-05-05

Degree:Master

Type:Thesis

Country:China

Candidate:S L Cai

Full Text:PDF

GTID:2558307103969299

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Path planning research of space manipulator plays an important role in orbit operation tasks such as track maintenance,mechanical assembly,auxiliary refueling and auxiliary docking.In recent years,the path control methods of manipulator are mainly based on reinforcement learning,and the commonly used reinforcement methods have strong dependence on the manipulator model and environmental model.In unknown,dynamic and unstructured scenarios,the manipulator can only operate in a preset way.Meanwhile,previous studies have not fully considered the impact of obstacles on the grasping accuracy during the path process,resulting in the failure of the manipulator to normally complete the operation in the original way.Meanwhile,the collision of obstacles will also endanger the safety of the manipulator itself.In order to improve the task completion rate of the robot arm in the obstacle scene and ensure the safety of the robot arm itself,this paper applies the continuous reinforcement learning method to carry out the research,and the specific research work is as follows:(1)A continuous reinforcement learning quantile truncation algorithm(TQC)is introduced,which can reduce the dependence of the manipulator on the environmental model.In view of the discontinuous updating of parameters of the TQC algorithm and the insufficient training efficiency caused by the randomness of the error of the algorithm,the TQC algorithm is optimized.Based on Bayesian constraint and distribution distance,the algorithm parameter iteration process is optimized to ensure the continuity of parameter update.The linear regression quantile rule of the original TQC algorithm is optimized,and the regression error is reduced by quadratic exponential smooth regression.The mathematical formula derivation is used to theoretically prove that the optimized TQC algorithm has higher accuracy.Based on Solidworks platform,the design of the manipulator was completed,and the node and connecting rod were combined with 6 degrees of freedom restriction to simplify the manipulator.Using matlab robot simulation platform,the manipulator movement environment,obstacles and grasping targets were set up,and the interactive code of python script and matlab script was written.The code of reinforcement learning algorithm was written and optimized by using python language,the simulation environment and algorithm environment were combined and unified,the application scenario of fixed obstacles were set,the path planning was carried out by using the optimized TQC algorithm,and the optimized TQC algorithm was verified based on the evaluation indicators such as the success rate of grasping and return.The experimental results show that the optimization algorithm in this paper can complete the obstacle avoidance task and the target capture.The average obstacle avoidance and grasp accuracy of the optimization algorithm are improved by 9.8% and 13.0% compared with the original TQC algorithm,and 12.8% and 17.1% compared with DDPG algorithm.(2)The curriculum learning framework is applied to further improve the optimized TQC algorithm.In order to cope with complex dynamic obstacle scenes,a TQC model for course learning of network update is established,and the way of Critic directly receiving related information of other agents in the TQC algorithm is canceled.Instead,the information after iteration is taken as a part of Critic input in the current Agent,the same experiment scene as that in(1)is used,the linkage of course learning and reinforcement learning were completed with python code.,and the model is applied to the moving obstacle scene.In the matlab platform,the scene of obstacle size change and translation dynamic change is designed.Q value,return and grasping success rate are also used to compare the algorithms.The experimental results show that the average success rate of obstacle avoidance and grasping of moving obstacles by using the TQC algorithm optimized by curriculum learning is 10.1% and 4.2% higher than that of non-curriculum learning method.The algorithm can be applied to the grasping task of obstacles and dynamic targets in the actual scene.The research results are reliable and have practical application value.

Keywords/Search Tags:

Reinforcement Learning, Mechanical Arm, Obstacle Avoidance, Grasp

PDF Full Text Request

Related items

1	Research On Dynamic Obstacle Avoidance Algorithm Based On Reinforcement Learning
2	Research On Obstacle Avoidance Of Robotic Manipulator Based On Reinforcement Learning
3	Research On Obstacle Avoidance For AUV Based On Reinforcement Learning
4	Research On Mobile Robots Obstacle Avoidance Planning Based On Reinforcement Learning Algorithm
5	Research And Implementation Of Visual Obstacle Avoidance Technology For Mobile Robot Based On Inverse Reinforcement Learning
6	Research On Obstacle Avoidance Of Mobile Robots In Multi-dynamic Obstacle Environment Based On Deep Reinforcement Learning
7	Research On Obstacle Avoidance And Navigation Based On Evolutionary Computation And Deep Reinforcement Learning
8	Visual Obstacle Avoidance And Navigation Research Based On Deep Reinforcement Learning With Fused Spatio-Temporal Features
9	Robotic Target Classification And Grasping Based On Deep Learning
10	Hierarchical Reinforcement Learning And Its Application To Obstacle Avoidance Problem For Manipulator