| Multi-agent path finding(MAPF),in multi-agent systems,is a challenging and meaningful problem,in which all agents are required to effectively reach their goals concurrently with not colliding with each other and avoiding the obstacles.Effective extraction from the agent’s observation,effective utilization of historical information,and efficient communication with neighbor agents are the challenges to completing the cooperative task.Although the traditional search algorithm adopts many skills to improve the efficiency of solution,it will inevitably face great search difficulty and long solution time when the environment is large.However,in recent years,the learningbased model has not solved many problems in the large-scale training environment and in-depth training process.To tackle these issues,in this thesis,we propose a well-designed deep network model,namly Local Attention-Cooperated Reinforcement Learning(LACRL).It utilizes the local states of nearby agents and obstacles and outputs an optimal action for each agent to execute,and finally is trained in the form of deep reinforcement learning through repeated trial and error.Our approach has four major components:the first part is the local observation encoder,which uses Convolutional Neural Networks to extract local partial observation and Gate Recurrent Unit to make full use of historical information collects the local guidance vector as the local direction information;The second part is the history reuse module,which makes full use of the historical observation information with the help of Gate Recurrent Unit;the third part is the communication block which uses Transformer architecture to combine the agent’s partial observation with its neighbors’ observations,and the final part is the decision block with the purpose to output the final action policy based the local observation and neighbors’ communication information.Based on these four main components,all agents will get the expected action of the agent from the model every time they get the observation results.The agent will complete the interaction with the environment according to the strategy given by the model,so as to complete their own decentralization planning strategy.Finally,in the given time steps of the experiment,the proportion of the number of agents that have reached the target in all agents can be used as the success rate of the model.The success rate is not enough to measure the quality of the strategy given by the model,because it only focuses on whether the agent has reached the goal within a given time,and ignores the quality of the path it finds.Therefore,this thesis introduces a new index-additional time rate and the proportion of additional time required to measure the advantages and disadvantages of the model strategy.Then,by constantly changing the experimental settings such as the number of agents,the size of the environment and the density of obstacles,the method in this thesis is compared with other previous benchmark algorithms in multiple environments.The experimental results show that the success rate and additional time ratio of the model method in this thesis are better than many benchmark methods.At the same time,the experimental results show that the model method in this thesis is effective in practice,especially in the case of large scale in the world. |