| With the rapid development of information technology and artificial intelligence,the modern battlefield has already changed from mechanization to electronic and intelligence.In the modern battlefield,one of the key tasks of the multi-agent system formed by various cooperated units is target tracking.It faces two challenges: on the one hand,the maneuverability action and motion intentions of the enemy target are more difficult to figure out with the increase of its intelligence.On the other hand,the difficulty of generating tracking policy increases exponentially with the rising number of the intelligent units.As one of the most typical units in modern war,the UAVs swarm has become a hotspot of armament competition among military power in the world.In the swarm,each UAV is equipped with one or several sensors to cooperate on detecting and tracking enemy target.UAVs swarm which equipped with a data link to form a sensor network can be seen as a target tracking multi-agent system,and the research of tracking policy in sensor network will extend to UAVs swarm.So,the multi-agent system represented by UAVs swarm is studied in the following three aspects:Firstly,aiming at the problem that the enemy’s maneuvering action and motion intention are hard to figure out,a target motion intention inference algorithm is proposed.Preliminary position information of the target is obtained by the bionic infectious disease method.Based on this information,the motion of the target is modeled according to the kinematics rules,and the motion intention is defined by the hidden Markov model to infer the motion state.The target motion intention inference algorithm weakens the influence of target maneuvering on the system and improves the tracking performance of the multi-agent system.Secondly,aiming at the difficulty in formulating tracking policy in UAVs swarm network,a segmented proximal policy optimization algorithm based on the bionic infectious mechanism is proposed.The algorithm not only integrates the bionic infectious mechanism and deep reinforcement learning but also divides the training process into two phases,sleep and wake-up,according to the reward mechanism.So that a better target tracking policy can be successfully learned by the network.Through this algorithm,the tracking task is completed by learned policy,the bionic mechanism with self-learning ability is explored,and the feasibility of the algorithm is verified.Finally,when UAVs swarm perform target tracking tasks,the large calculation of the tracking task is led by the high-speed motion of UAVs.So,an end-to-end UVAs swarm deep reinforcement learning algorithm based on “proximal policy optimization” is proposed.Under the constraints of the risk of collision and the limited communication range between UAVs,the UAV motion model is established,the UAVs swarm topology is defined,and the tracking task is built as a Markov decision process for correlation training.The “centralized training,distributed execution” is designed to train the UAVs swarm.Compared with the traditional centralized scheduling system,UAVs’ own policy network can be deployed to complete the tracking task. |