With the gradual depletion of land resources,more and more people pay attention to the marine resources that need to be explored urgently.It is one of the mainstream development trends of marine development tasks in the future to apply various underwater platforms equipped with underwater manipulators for underwater operations.Due to the complex and changeable underwater working environment,it is difficult to establish an accurate dynamic model of underwater manipulator,and the application effect of traditional control algorithm can not meet the increasing underwater working demand.As a machine learning method independent of problem model,deep reinforcement learning algorithm can learn experience optimization strategies by interacting with the environment.In this thesis,the application of deep reinforcement learning in motion control of underwater manipulator is studied.The main research contents and innovative achievements of this thesis are as follows:1.Analysis of the four-degree-of-freedom underwater manipulator based on the theoretical knowledge of kinematics and dynamics: establish the kinematics model of the four-degree-of-freedom manipulator with DH method,and calculate the workspace;establish a dynamic model with Lagrange method and Morison formula,and verify the feasibility of building an underwater manipulator simulation environment in Webots software.Aiming at the problem of underwater manipulator,compare the characteristics of different reinforcement learning algorithms,and finally selesct DDPG algorithm to solve the problems related to the motion control of underwater manipulator.2.Build a DDPG training simulation environment based on Matlab/Simulink for robot arm obstacle path planning,analyze the characteristics of underwater robot arm path planning,and design an agent accordingly.In the design of state space and action space,a scheme is designed to directly make joint angle change by observing the state of underwater manipulator.In the design of reward function,analyze the negative factor of sparse reward,and design a segmented reward for path planning problems.Finally,through the comparison of simulation training results,it is verified that DDPG agent can directly learn the policy of controlling joint changes to complete the obstacle path planning task without relying on kinematics model.3.Build the DDPG training simulation environment of underwater manipulator based on Webots and Python,analyze the characteristics of joint trajectory tracking of underwater manipulator,and design agents accordingly.In the design of state space and action space,design the scheme of making torque action decision directly by observing the state of underwater manipulator;and the DDPGPID algorithm combining PID control algorithm to adjust PID parameters by the observation state of underwater manipulator.In the design of reward function,design the segmented reward suitable for joint trajectory tracking.Finally,through the comparison of simulation results,it is verified that DDPG agent has the possibility of directly outputting each joint torque through joint motion parameters to complete the joint trajectory tracking task.On another hand,the combination of DDPG algorithm and traditional control algorithm is proved to be more effective in motion control,which is better than the single application of the two algorithms.The DDPGPID algorithm proposed in this thesis can effectively solve the joint trajectory tracking problem of underwater manipulator. |