In recent years,with the increasing complexity of the tasks carried out by ships,higher requirements for the automatic control system of underactuated ships have been put forward.Automatic berthing system is an indispensable part of realizing efficient and safe navigation.With the development of unmanned surface vessels technology,it is of great practical significance to establish an efficient and accurate intelligent automatic berthing system.Reinforcement learning methods have become a hot research direction in the field of artificial intelligence due to their potential to solve complex control and decision-making problems.Reinforcement Learning from Demonstration(RLf D)methods,which combines reinforcement learning and imitation learning,can improve the speed and stability of agent training process through the data provided by various expert policies.Although RLf D has a good practical application prospect,it is necessary to deal with the distribution mismatch problem at the same time.Aiming at the automatic berthing problem of underactuated surface vessels,this paper designs two RLf D methods that combine Actor-Critic and model predictive control.The theoretical and simulation results show that the proposed methods have good convergence and can effectively solve the problem of distribution mismatch.The simulation results also show that the learning speeds of the RLf D algorithms are more than half faster than that of the typical model free Actor-Critic algorithm.The contributions and innovations of this paper are as follows:(1)In view of the problem of underactuated ship automatic berthing,the problem is fomulated as Markov decision making process on the basis of mathematical modeling of ships.The scheme of reinforcement learning is designed to solve the problem.And model-free Actor-Critic algorithms are applied to ship berthing problem.Simulation results show that the reinforcement learning method can complete the automatic berthing task without relying on the mathematical model information and motion planning.(2)Aiming at the slow convergence speed of model-free Actor-Critic algorithm,a RLf D method combined with model predictive control is proposed.In order to solve the problem of insufficient expert data and sub-optimal expert policy,an interactive expert controller with Actor-Critic combined with model predictive control is designed,which can provide the agent with expert data and improve the performance synchronously with the agent’s learning.Aiming at the distribution mismatch problem of the proposed RLf D method,two improvement schemes are proposed on the basis of theoretical analysis.Simulation shows the effectiveness of the proposed RLf D method and its variant methods.Compared with the model-free Actor-Critic method,the training speed is accelerated and the learning efficiency is improved.(3)Reformulating the original reinforcement learning problem into a constrained optimal control problem through theoretical analysis.Based on the RLf D method,the SGAC algorithm is proposed.The algorithm utilizes an agent to interact with the environment,and the expert is only responsible for providing expert guidance online.In the training phase,the dual gradient method is used to solve the optimization problem,and the convergence of the proposed method is theoretically analyzed.The test is carried out in the established automatic berthing simulation environment.The simulation results show that the SGAC algorithm can solve the distribution mismatch problem,and the learning process is more stable and the convergence speed is faster.Compared with the model-free Actor-Critic algorithm,the berthing trajectory obtained by SGAC is smoother. |