Font Size: a A A

Multi-AUV Distributed Collaborative Target Search Based On Deep Reinforcement Learning

Posted on:2023-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:H XuFull Text:PDF
GTID:2558307061453354Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Autonomous Underwater Vehicle(AUV)is an important tool for humans to explore and develop the ocean.Compared with a single AUV,a multi-AUV system can complete more complex underwater tasks by cooperating with each other.However,the realization of multiAUV cooperative control faces the problems of uncertain underwater environment,limited communication between AUVs,and difficulty in precise control of a single AUV under local observation.With the continuous development of artificial intelligence technology,machine learning technology is more and more applied in the field of control.Among them,reinforcement learning does not need to rely on the accurate model of the system,and can learn behavioral strategies through the interaction experience between the agent and the environment.It has a strong ability to adapt to the environment.Therefore,this paper applies deep reinforcement learning technology to the field of multi-AUV distributed cooperative control.First,the autonomous path planning and high-precision trajectory tracking control of a single AUV under local observation and time-varying disturbance are studied.Then we design the corresponding reinforcement learning algorithm and training framework for controller training of multi-AUV cooperative target search,which is the typical multi-AUV collaborative control task scenario.Finally,we realize the distributed collaborative control of multiple AUVs.The main work and innovations of this thesis are as follows:(1)Aiming at the local path planning problem of a single AUV,an improved deep deterministic policy gradient algorithm DDPG_3C is proposed.It adopts the multi-value network averaging and introduces the experience priority playback mechanism,which overcomes the problem of value network overfitting and low training efficiency in the traditional deep deterministic policy gradient algorithm DDPG.The local path planning problem is transformed into a partially observable Markov decision process of the environmental state,and the path planning controller training based on the DDPG_3C algorithm is carried out.Then,a simulation experiment environment based on the actual dynamic model of AUV was built.The experiments verified that the proposed algorithm had better path planning effect and training speed than the DDPG algorithm,and also verified the generalization performance of the algorithm in different path planning tasks.(2)Aiming at the problem of trajectory tracking of a single AUV under unknown perturbation,combined with the above improved deep deterministic policy gradient algorithm DDPG_3C,an AUV trajectory tracking controller training framework based on reinforcement learning is constructed.First,the linear trajectory tracking and circular trajectory tracking tasks are converted into MDP models suitable for reinforcement learning.The state space,reward function and other elements are designed based on the characteristics of the task,and then the corresponding neural network structure is built.Comparing with the traditional DDPG algorithm in the two tasks of linear and circular trajectory tracking,the efficiency of the training framework and the generalization performance of the training model are verified.(3)Aiming at the multi-AUV collaborative target search problem,based on the aforementioned reinforcement learning-based single AUV precise control,combined with the multi-agent reinforcement learning algorithm and the idea of centralized training and distributed execution,a distributed controller training frame for multi-AUV systems is designed.The cooperative control task in the case of limited communication between multiple AUVs is transformed into a Markov decision process that is partially observable by multiple agents,and a recurrent neural network is used to extract the timing information in the behavior of AUVs.So as to realize the efficient cooperative control of multiple AUVs under the condition that the environmental information is not completely observable.Aiming at the problem of efficient distributed control of multiple AUVs,an improved multi-agent deep deterministic policy gradient algorithm MADDPG_3C is proposed,and the two-stage training method and centralized training are combined with the distributed execution idea to construct a corresponding training framework.Finally,a simulation experiment environment is built for the two sub-tasks of multi-AUV cooperative obstacle avoidance and cooperative target search.The experimental results show that the trained reinforcement learning controller can efficiently complete the multi-AUV cooperative target search task.
Keywords/Search Tags:autonomous underwater vehicle, reinforcement learning, path planning, trajectory tracking, collaborative target search
PDF Full Text Request
Related items