Multi-agent area coverage is an important application of Multi-Agent Systems(MAS)in the fields of battlefield reconnaissance,environmental exploration,search and rescue,among others.The task requires multiple agents with perception capabilities to complete a thorough traversal of a specific area.Existing area coverage methods suffer from weak cooperative planning capability and redundancy in coverage paths,meanwhile,the existing methods oversimplify the simulation modeling of the coverage environment and pay less attention to non-ideal communication conditions and obstacle avoidance constraints in actual tasks.Driven by the practical application of area coverage algorithms,this thesis concentrates on the area coverage method based on multi-agent reinforcement learning.The area coverage problem can be divided into discrete path planning and continuous motion control,after uniformly gridding the target area according to the perceptual range of the agents,a complete traversal and coverage of the target area can be achieved through alternating coverage path planning and motion control.For the discrete path planning task,a multi-agent cooperative coverage path planning algorithm applicable to grid partitioning is proposed.The coverage map is introduced to record the coverage progress information of MAS,and the reinforcement learning elements such as observation state,action space,and reward function of the agent are constructed based on it.The coverage map fusion is implemented to share the coverage state information among the agents and alleviate the problem of incomplete observation of the environment.Afterward,the QMIX algorithm is improved from two aspects:network model structure and parameter sharing mechanism.MAS performs distributed cooperative coverage path planning with the goal of maximizing global returns,effectively reducing coverage area overlap and redundancy.For the continuous motion control task,firstly,models are built for agent motion control,non-ideal communication,and single-line Li DAR-based obstacle detection process to simulate the agent’s behavior in the actual coverage environment.Secondly,the SAC(Soft Actor-Critic)algorithm is extended and improved based on the centralized training and distributed execution framework,and a multi-agent cooperative motion control algorithm suitable for unknown obstacle environments is proposed,where all the agents equate and learn a common control strategy.The reward sparsity problem is addressed by adding additional auxiliary rewards in the reward function.Finally,a feature extraction method based on the attention mechanism is proposed,which adaptively aggregates important feature information of adjacent agents and obstacles according to information such as relative position and velocity to alleviate the dimensional change of the observed state due to partial observability and the dimensional disaster problem of centralized training.A multi-UAV environment is built in the Webots simulator for area coverage simulation.The experimental results demonstrate that the two-stage multi-agent area coverage method proposed in this thesis can be successfully applied to UAV clusters with higher coverage efficiency and robustness,the average coverage time is shortened by more than 11.6% compared with the Anti-Flocking algorithm,and it can still accomplish the area coverage task in an environment with limited communication range and random obstacles. |