| The research of multi-agent systems is of great significance and application value to many fields such as military missions and people’s livelihoods.With the development of a new generation of artificial intelligence,multi-agent cooperation based on deep reinforcement learning is the future development trend.Most of the current research is based on the fact that the agent can observe the relevant information of all other agents from its own point of view.In the process of the actual multi-agent system performing tasks,due to the limitation of communication,the topology between multiple agents is not always fully connected.Under partially observable distributed conditions,there are also problems of complex environments and multi-agent multi-tasking.Aiming at the aforementioned problems,this dissertation is based on multi-agent systems,reinforcement learning,and deep learning,and research is gradually carried out from the aspects of the reward function,collaborative policy,value optimization,and neural network architecture.This work explores multi-agent deep reinforcement learning methods,which provide advanced theoretical methods and technical support for multi-agent distributed fixed topology,switching topology,complex environment,and multi-task problems in practical applications.Aiming at the problem of multi-agent distributed cooperation under fixed topology,a multi-agent distributed deep reinforcement learning method based on deep deterministic policy gradient is proposed.In the multi-agent distributed cooperative environment,considering the limitation of system communication,each agent can only obtain information related to its neighbors.A framework of the private reward mechanism based on topology structure is designed.Each agent contains a decentralized actor and a distributed critic for decentralized execution and distributed training,respectively.The results obtained in different multi-agent distributed tracking experimental environments show that the proposed method effectively improves the training speed and achieves better performance.Aiming at the problem of multi-agent distributed cooperation under switching topology,a multi-agent deep reinforcement learning method based on level-fusion is proposed.A direction-assisted actor and a dimensional pyramid fusion critic are constructed for the agent.At the same time,with the goal of maximizing the overall reward,a weighted sum of an individual reward and a shared reward is designed as the total reward of agents with a reward influence factor.A loss function is added as the role of an experience advisor to ensure the stability of policy learning.The results obtained in different multiagent distributed switching topology interception experimental environments show that the proposed method can achieve better learning performance and interception effect.Aiming at the problem of multi-agent distributed cooperation in complex environments,a multi-agent twin deep reinforcement learning method based on an attention mechanism is proposed.The twin attention critic is designed to alleviate the problem that the state-action value is often overestimated,and the loss function of the critic is designed using the state-action value generated by the current twin attention network and the state-action value estimated by the target twin attention network.Due to the different guiding efficiency of the critic network to the actor in different learning stages,a delay attenuation policy is proposed to further improve the learning ability of the actor.A variable weight coefficient of entropy is introduced to meet the needs of policy exploration in different learning stages.The results obtained in different multi-agent cooperative defensive escort experimental environments show that the proposed method is effective and scalable.Aiming at the problem of multi-agent distributed cooperation under multi-task conditions,a multi-agent branch attention deep reinforcement learning method based on Q value mixing is proposed.A branch attention agent network is constructed to improve its predictive ability for state-action values.A multi-agent adaptive branch QMIX network structure is designed to comprehensively evaluate the information of each agent.A delayed exploration loss is adopted to make the agent exploratory and make the Q value mixing function training more stable.The results obtained in different multi-agent multitask cooperative rescue experimental environments show that the proposed method can achieve better learning effects and a higher success rate.The multi-agent distributed cooperation methods based on deep reinforcement learning proposed in this dissertation expand the application of artificial intelligence algorithms in multi-agent systems,which have essential theoretical value and application prospects for guiding the research of multi-agent distributed cooperation. |