| In recent years,multi-agent reinforcement learning algorithms have developed rapidly and the results are also very significant.Recent methods for multi-agent reinforcement learning problems make use of deep neural networks and provide state-of-the-art performance.However,there are still some problems and challenges in the development of this field.These deep reinforcement learning methods suffer from reproducibility issues.The algorithm itself has instability factors,which makes it difficult to achieve high performance in many scenarios without special training skills.The first is that though utilizing coarse-grained scenario transfer,these methods may still suffer from inefficient training,e.g.,in the simplest scenario where moving agents v.s.stationary ones.It is still difficult to reproduce the original performance using the existing multi-agent reinforcement learning algorithm.Especially when transfering to a complex scenario,the performance degradation is more serious.Secondly,in the large-scale multi-agent reinforcement learning environment,there is still a problem of state space explosion.In large-scale scenarios,the algorithm itself is difficult to achieve a high level of performance.When transfering from small-scale scenarios to large-scale scenarios,the existing network structure is difficult to adapt to cross-quantity scale transfer.This subject was mainly focus on the transfer algorithm and distributed extension algorithm of reinforcement learning.Firstly,we propose a fine-grained curriculum transfer training for multi-agent problem.The fine-grained curriculums are designed from easy to hard by changing the chosen hyper-parameter of environment,e.g.,the initial distance between competitive agents varies from near to far,and the initial speed and acceleration between agents change.The experimental results indicate that UAV agents trained via fine-grained curriculum transfer learning are able to reach the target score more quickly,and achieve superior performance.Compared with coarse-grained transfer learning,the convergence speed and the winning rate of fine-grained curriculum transfer training is increased.Secondly,we propose an observation information aggregation network based on the attention mechanism for transfer learning.The graph neural network is combined with the multi-agent deep deterministic policy gradient(MADDPG)algorithm,and the aggregation function is used to process the agent observation and use it as input.At the same time,the attention mechanism is used to supplement the aggregation function,and the attention mechanism is used to aggregate the observation groups of each agent,so that the input dimension of the network is irrelevant with the number of agents in the environment.And it can still train and learn the information of different groups of itself,teammates and enemies.Start training from a small-scale multi-UAV combat scenario,and gradually increase the number of agents.The experimental results indicate that,compared with the non attention transfer learning method,through the MADDPG algorithm-based attention aggregation transfer learning network,multi-agent UAV combat training can achieve the target performance faster and provide better performance.Lastly,in order to deal with the performance bottleneck of the large-scale multi-agent reinforcement learning algorithm itself,we use a distributed parallel framework based on the MADDPG algorithm,and research to achieve the improvement of the performance and computational efficiency of large-scale multi-agent reinforcement learning.In order to improve training efficiency,increase nodes to conduct distributed training for multiagent UAV combats.The combination of fine-grained curriculum transfer learning and distributed framework enables training not only for multi-threaded operations on a single machine,but also for multi-node distributed training on clusters.Through this distributed method,the training speed of multi-agent reinforcement learning is improved. |