| With the increasing application of UAVs,people are no longer satisfied with the use of single UAV,so researchers have been inspired by the behavior of clusters in nature and proposed the concept of unmanned clusters.Our paper focuses on a typical problem in the study of unmanned clusters that is the problem of multi-target assignment of unmanned clusters.This problem requires that each UAV must reach one target location accurately,avoid collisions with each other,and avoid the threat area during the flight.In the traditional solution,it is regarded as an optimization problem to be solved.However,these methods require global information.Once the environment changes,it needs to recalculate the optimal solution so that it cannot guarantee that the unmanned cluster system can deal with the dynamic environment in real time.In addition,there is no effective means to verify the correctness of the algorithm in the actual physical environment.It is difficult for the UAV to accurately fly over each target in the actual application scenario due to the large GPS error.In general,it is necessary to use differential GPS and other additional hardware support to achieve accurate hover,but this increases the cost and complexity of the system.In view of the limitations of traditional methods,the main contributions of this paper are as follows:(1)A multi-target dynamic assignment system architecture for unmanned cluster is proposed.The system architecture integrates algorithm training,cross development environment model deployment and three-dimensional physical simulation experiment module.In this system,the three modules cooperate with each other to complete the training,model deployment and simulation experiment of the multi-target dynamic assignment model of the unmanned cluster,which is a complete system architecture scheme for this problem.(2)A multi-target dynamic assignment algorithm for unmanned cluster is proposed.The algorithm transforms the traditional multi-target assignment problem into multi-agent training problem based on the idea of multi-agent reinforcement learning.The solution of the problem is transformed from optimization process to Markov decision process.The transition between states is only related to the current state and the action selected by the observation of environment,and the environment is also randomly changing in the training.Therefore,the model can adapt to the dynamic environment without retraining when the environment changes.On this basis,we also put forward the "critical area" to strengthen the training effect of collision avoidance.(3)An accurate hovering algorithm for UAV is proposed.It is based on Q-Learning reinforcement learning.It uses computer vision information to maps the relative position of UAV and target to state space,and the accurate hovering process is also transformed into Markov Decision Processes.It can control the UAV to hover over the target precisely because the state space can be divided into very small and the action of UAV also depends on the state transition.Finally,this paper constructs a 3D physical simulation environment based on ROS+ gazebo and implements the prototype system to verify the correctness and effectiveness of the algorithms.In the prototype system,we use the remote procedure call mechanism to build the model deployment module between the two environments to solve the problem of the simulation environment of UAV is not compatible with the model running environment.Our algorithm is validated by the prototype system. |