| With the advancement of society and the continuous improvement of military power in various countries,the UAV swarm will play an important role in the civil and military fields.UAV swarm reconnaissance surveillance can be used in the civil field for environmental protection detection,power inspection,disaster relief,traffic monitoring,etc.In the military field,the UAV swarm will replace the traditional reconnaissance and surveillance mode in the future intelligent warfare.Two tasks of reconnaissance and surveillance of the UAV swarm are highly complex,such as both the coverage search of the area and the tracking and surveillance of the discovered targets.In addition,the reconnaissance area has uncertainty and dynamics,and the traditional algorithm lacks effective modeling and solving methods.In this paper,based on the typical cooperative reconnaissance task scenario of the UAV swarm,a task model with four sub-models is constructed according to the input and output of model data.A multi-agent partially observation markov decision process is used to formally describe the reconnaissance task process of UAV swarm,and a deep reinforcement learning method is introduced to solve the swarm task decision problem.According to the current advanced multi-agent self-synchronization and cooperative method,a multi-agent autonomous cooperative framework based on deep reinforcement learning is built.According to the factors of the task,the typical autonomous cooperative reconnaissance process of the UAV swarm is analyzed.Combined with the multi-agent autonomous cooperative framework,the UAV swarm autonomous cooperative reconnaissance method based on Multi-agent Deep Deterministic Policy Gradient(MADDPG)and the UAV swarm autonomous cooperative reconnaissance method based on bi-directional coordination gradient are proposed.The main research work of this paper includes:Firstly,a multi-agent autonomous cooperative framework based on deep reinforcement learning is built.At present,the UAV cluster control technology has made some progress,but the research on cluster distributed task coordination is still in its infancy,and there is still no effective modeling and solution for a class of complex tasks that simultaneously perform regional coverage search and target tracking monitoring.This paper draws on the latest research results in the field of machine learning represented by deep reinforcement learning,and applies it to the autonomous cooperative reconnaissance mission of UAV clusters,and builds a distributed autonomous collaborative framework based on multi-agent deep reinforcement learning.Secondly,the autonomous cooperative reconnaissance method for UAV swarm based on MADDPG is proposed.The network of autonomous cooperative reconnaissance for variable number of UAV swarm is constructed.Through the combination of reward,parameter adjustment and observation space design,the autonomous cooperative reconnaissance method of UAV swarm is optimized through theoretical analysis.This method focuses more on the autonomy of each UAV,and can deal with complex dynamic environments and self-synchronized collaboration to complete regional coverage search and target tracking monitoring problems through swarm collaboration.The validity,autonomy,synergy and robustness of the proposed UAV swarm autonomous cooperative reconnaissance method based on MADDPG are verified in the experimental simulation environment.Thirdly,the autonomous cooperative reconnaissance method for UAV swarm based on bi-directional coordination gradient is proposed.The bi-directional RNN is used as a connection between UAVs to ensure inter-machine information interaction on the one hand and swarm memory on the other hand.This paper builds the overall network of the swarm,strengthens the swarm intelligence,and output the action of each UAV online in real time.Through the combination of reward,punishment,parameter adjustment and observation space design,the theoretical analysis is used to optimize the autonomous cooperative reconnaissance method of the UAV swarm.This method focuses more on the overall collaboration of the swarm and can handle the rapid dynamic changes of the environment and the self-synchronized collaboration to complete the regional coverage search and target tracking monitoring problems through swarm collaboration.In the experimental simulation environment,the validity,autonomy,synergy and robustness of the proposed UAV swarm autonomous cooperative reconnaissance method based on bi-directional coordination gradient are verified.The two methods proposed in this paper are compared with other advanced collaborative methods.The performance of the proposed autonomous cooperative reconnaissance method based on the bi-directional coordination gradient is more advantageous in the dual tasks of area coverage search and target tracking and monitoring. |