| As an important marine science and technology,the autonomous operation technology of Underwater Vehicle Manipulator System(UVMS)has attracted the attention of more and more scientific researchers and institutions.Reinforcement learning,as an excellent artificial intelligence control method,has a broad development prospect in the field of UVMS autonomous operation.Compared with the traditional control methods,using reinforcement learning can not only ensure the completion of autonomous tasks,but also have higher operation efficiency and adaptability to complex underwater environment.This thesis makes an in-depth analysis and research on this subject.The main research contents are as follows:Based on the underwater grasping task,this thesis analyzes underwater autonomous operation tasks and establishes the UVMS coordinate frame.By analyzing the relationship between the generalized displacement and the generalized velocity of vehicle and manipulator,the kinematics model of UVMS is established.Then dynamic model is established by analyzing the dynamic of vehicle and the coupling effect between vehicle and manipulator.On this basis,the UVMS simulation model is established according to the underwater robot and the simulation experiment tasks of autonomous operation are designed,laying the foundation for the autonomous operation research of underwater robot based on reinforcement learning.According to the high dimension and continuity of state space and the continuity of action space of underwater grasping tasks,analyze the applied possibility of reinforcement learning method and select the policy based reinforcement learning method.Then aiming at the problem of sparse reward in the training process,a reward shaping scheme based on artificial potential field method is designed,and its effectiveness is verified by simulation experiments.Aiming at the problem that the traditional policy based reinforcement learning algorithm is unstable and difficult converge effectively in the learning process,the Proximal Policy Optimization(PPO)algorithm is adopted and improved combined with actor-critic algorithm.The effect of the improved algorithm is verified by simulation experiments.Finally,considering the simulation performance and physical experiment requirements,the state sampling scheme is further optimized based on human operation experience.By using the final state sampling cheme,the underwater robot completes all simulation experiment tasks in the simulation environment excellently.By analyzing the difference between the simulation environment and the actual environment,the policy obtained from the final training is transferred to the underwater robot control program.The policy is successfully applied to the autonomous operation control of vehicle and manipulator in physical experiments.Finally,the experiment results are analyzed. |