| With the rapid development of the shipping industry,the frequency of ship traffic accidents in the waters of the Straits has gradually increased.The strait waters is taken as the main research background and a deep reinforcement learning algorithm is adopted that performs well in complex environments and incorporates features such as compliance with international rules for collision avoidance,different magnitudes of avoidance for ships of different sizes,and return to navigation after collision avoidance.To speed up the convergence of the algorithm,an enhanced truncation function is designed to improve the learning rate of the algorithm,and a multi-process architecture is used to implement the algorithm with the idea of parallel computing.A collision avoidance algorithm that complies with the international rules for collision avoidance and the homing requirements of the strait waters and can avoid ships of different sizes with different magnitudes,which improves the safety of ship navigation in the strait waters.The main research elements of this thesis are as follows:(1)The feature of existing intelligent algorithms are analysed,and the deep reinforcement learning algorithm most suitable for the application scenario of this thesis is selected as the application algorithm.To verify that the Proximal Policy Optimization algorithm selected in this thesis performs better in the ship collision avoidance problem,a simulation of the most frequent three-ship encounter situation in the strait waters is conducted that based on Python environment.The Q-learning algorithm and the Proximal Policy Optimization algorithm are used in the simulation,and the reward function of the collision avoidance rule and the reward function of the steering amplitude are combined to compare the reward value curves of the two algorithms and the collision hazard curves of their planned paths,it is concluded that the Proximal Policy Optimization algorithm performs better in the ship collision avoidance problem.(2)In view of the complex navigational environment in the strait waters,the ship avoidance boundary is designed to be positively correlated with the size of the ship,so that the ship boundary distance reward function can be designed to avoid ships of different sizes to improve the safety of navigation in the strait waters.To meet the demand for ships to follow the specified route when there is no risk of collision in the strait waters,the return to the route reward function is designed to add the function of return to the route after collision avoidance to the algorithm.In order to improve the applicability of the algorithm to the scenario of ship emergence,two endpoints are designed for the obstacle ships to arrive at random successively to simulate the emergence of other ships during the algorithm training.In order to solve the problem of slow convergence of the algorithm when solving complex problems,an enhanced truncation function is designed to improve the learning rate of the algorithm,and a Distributed Proximal Policy Optimization algorithm is implemented using multiple processes.(3)The Distributed Proximal Policy Optimization algorithm designed in this thesis is simulated and compared with the Q-learning algorithm,and the minimum distance between the ship and the strait boundary during collision avoidance and the collision risk,as well as the distance between the ship and the end of the route and the angle between the ship and the route at this time,are used as evaluation indexes to verify that the algorithm in this thesis can comply with the international rules of collision avoidance and can maintain a safe distance from the strait boundary and return after collision avoidance.It is verified that the algorithm can avoid ships of different sizes and keep a safe distance from the strait boundary while complying with the international rules of collision avoidance,and can return to the route after the collision avoidance. |