Underwater vehicles are important tools for humans to explore the ocean and conduct underwater operations.Autonomous Underwater vehicles(AUV)have gradually become a research focus in recent years due to their cable-free,safety,and other advantages.AUV has played a key role in marine monitoring,data collection,target tracking,and hunting.As the time and space complexity requirements of the task are getting higher and higher,a single AUV has been unable to meet the task requirements.Researchers have begun to study the multiAUVs system.Compared with a single AUV,a multi-AUV system has significant advantages in terms of efficiency,robustness,and flexibility in performing tasks.Multi-AUVs cooperative hunting is a key issue in the study of multi-AUVs systems.In the multi-AUVs cooperative hunting problem,there are two relationships,the cooperation between multiple AUVs and the competition between AUV and the hunted target.Its core technologies include cooperative control,Obstacle avoidance,real-time path planning,and so on.In recent years,deep reinforcement learning has achieved good results in the field of robot control.The application of deep reinforcement learning algorithms to multi-AUV systems can improve the collaborative control ability between multiple AUVs and the ability of AUV to compete against the target being rounded up during cooperative hunting.This paper deeply analyzed the multi-AUVs cooperative hunting problem,abstracts the problem and combines it with deep reinforcement learning,proposes a multi-AUVs cooperative hunting algorithm based on deep deterministic policy gradient and a multi-AUVs cooperative hunting algorithm based on multi-agent deep deterministic policy gradient,designs and implements the experimental schemes of the two algorithms.Through the experimental simulation and comparison of the two algorithms,it is found that the deep reinforcement learning algorithm has a sparse reward problem in controlling the cooperative hunting of multiAUVs.Aiming at the problem of sparse external rewards,this paper proposes a multi-AUVs cooperative hunting algorithm based on the intrinsic motivation of strategic influence,which promotes the cooperation and exploration of multi-AUVs by providing effective internal rewards,speeds up the convergence speed of the algorithm,and then alleviates the negative impact caused by sparse rewards;Aiming at the problem that it is difficult to obtain rewards in complex models,a model training method based on curriculum learning mechanism is proposed.The complex model is decomposed into multiple related simple sub-models to train the algorithm,to improve the final effect of multi AUVs cooperative hunting.Finally,the simulation results prove that the multi-AUVs collaborative hunting algorithm based on the intrinsic motivation of intrinsic motivation can significantly improve the training convergence speed;on the premise of ensuring the convergence speed of the algorithm,the model training method based on the curriculum learning mechanism can improve the ability of obstacle avoidance during the multi-AUVs cooperative hunting,and then improve the efficiency of the multi-AUVs cooperative hunting.In addition,it is found in the experiment that the training of the curriculum learning mechanism can improve the ability of the AUV and the hunted target to use the environment to optimize their strategy. |