Font Size: a A A

Path Planning Of Manipulator Based On Improved Reinforcement Learning

Posted on:2024-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:S L CaiFull Text:PDF
GTID:2558307103969299Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Path planning research of space manipulator plays an important role in orbit operation tasks such as track maintenance,mechanical assembly,auxiliary refueling and auxiliary docking.In recent years,the path control methods of manipulator are mainly based on reinforcement learning,and the commonly used reinforcement methods have strong dependence on the manipulator model and environmental model.In unknown,dynamic and unstructured scenarios,the manipulator can only operate in a preset way.Meanwhile,previous studies have not fully considered the impact of obstacles on the grasping accuracy during the path process,resulting in the failure of the manipulator to normally complete the operation in the original way.Meanwhile,the collision of obstacles will also endanger the safety of the manipulator itself.In order to improve the task completion rate of the robot arm in the obstacle scene and ensure the safety of the robot arm itself,this paper applies the continuous reinforcement learning method to carry out the research,and the specific research work is as follows:(1)A continuous reinforcement learning quantile truncation algorithm(TQC)is introduced,which can reduce the dependence of the manipulator on the environmental model.In view of the discontinuous updating of parameters of the TQC algorithm and the insufficient training efficiency caused by the randomness of the error of the algorithm,the TQC algorithm is optimized.Based on Bayesian constraint and distribution distance,the algorithm parameter iteration process is optimized to ensure the continuity of parameter update.The linear regression quantile rule of the original TQC algorithm is optimized,and the regression error is reduced by quadratic exponential smooth regression.The mathematical formula derivation is used to theoretically prove that the optimized TQC algorithm has higher accuracy.Based on Solidworks platform,the design of the manipulator was completed,and the node and connecting rod were combined with 6 degrees of freedom restriction to simplify the manipulator.Using matlab robot simulation platform,the manipulator movement environment,obstacles and grasping targets were set up,and the interactive code of python script and matlab script was written.The code of reinforcement learning algorithm was written and optimized by using python language,the simulation environment and algorithm environment were combined and unified,the application scenario of fixed obstacles were set,the path planning was carried out by using the optimized TQC algorithm,and the optimized TQC algorithm was verified based on the evaluation indicators such as the success rate of grasping and return.The experimental results show that the optimization algorithm in this paper can complete the obstacle avoidance task and the target capture.The average obstacle avoidance and grasp accuracy of the optimization algorithm are improved by 9.8% and 13.0% compared with the original TQC algorithm,and 12.8% and 17.1% compared with DDPG algorithm.(2)The curriculum learning framework is applied to further improve the optimized TQC algorithm.In order to cope with complex dynamic obstacle scenes,a TQC model for course learning of network update is established,and the way of Critic directly receiving related information of other agents in the TQC algorithm is canceled.Instead,the information after iteration is taken as a part of Critic input in the current Agent,the same experiment scene as that in(1)is used,the linkage of course learning and reinforcement learning were completed with python code.,and the model is applied to the moving obstacle scene.In the matlab platform,the scene of obstacle size change and translation dynamic change is designed.Q value,return and grasping success rate are also used to compare the algorithms.The experimental results show that the average success rate of obstacle avoidance and grasping of moving obstacles by using the TQC algorithm optimized by curriculum learning is 10.1% and 4.2% higher than that of non-curriculum learning method.The algorithm can be applied to the grasping task of obstacles and dynamic targets in the actual scene.The research results are reliable and have practical application value.
Keywords/Search Tags:Reinforcement Learning, Mechanical Arm, Obstacle Avoidance, Grasp
PDF Full Text Request
Related items