Multi-agent systems have attracted widespread attention because of their good autonomy,expansibility and flexibility.With the increasing data scale of multi-agent systems and the higher and higher requirements the degree of intelligence,multi-agent deep reinforcement learning technology has gradually become the key area of multi-agent intelligent decisionmaking and control research.In recent years,multi-agent deep reinforcement learning technology has made some progress in some fields,but it still faces many problems and challenges.First of all,the multiagent explored whether the learned policy could converge to the optimum in different dynamic environments.Secondly,the ability of multi-agent to reuse learning experience.Thirdly,the reward signal,as the feedback information of the agent’s interaction with the environment,is generally sparse.Aiming at the above problems and combining with the existing framework,this paper mainly completes the following work:(1)The deep reinforcement learning algorithm fails to pay full attention to the role relationship between agents when focusing on the global team reward,which weakens the overall strategic collaboration ability of the multi-agent system.Therefore,a collaborative average multi-agent deep reinforcement learning(CAMR)method based on value decomposition is proposed.Firstly,the method not only considers the maximization of the joint action value function,but also uses the individual local value function to introduce a value function as a weight operator,so as to reduce the adverse effects between agents and improve the task success rate.Secondly,the exploration strategy is optimized by adding the Euclidean distance judgment,and an exploration method is designed to help the agent jump out of the suboptimal strategy,so as to improve the exploration efficiency and strategy cooperation of the agent.Finally,the experimental results in the real-time strategy game Starcraft II environment demonstrated the effectiveness of CAMR.(2)Aiming at the disadvantages of experience reuse sampling in the training of deep reinforcement learning algorithm,and the problems such as emphasizing action value and weakening state value in the decision-making of agents,the collaborative multi-agent deep reinforcement learning method based on priority value network(PVMA)is proposed.First of all,the method introduces a priority experience replay mechanism,which is empirical reuse through the importance weight of empirical data to compensate for the problem of random sampling.Secondly,the advantage value network form is introduced into the value network of the agent to estimate the information difference between the state value and the action advantage value.Finally,the experimental results of multiple collaboration scenarios show that the proposed method can improve the learning and cooperation quality of the agent,so that agents can complete the given task faster and better.(3)For the sparse distribution problem of the reward signal feedback of the deep reinforcement learning algorithm,the multi-agent deep reinforcement learning collaboration method based on behavioral motivation reward(CBMR-MARL)is proposed.Firstly,according to the dynamic changes of the environment,this method starts from the behavior motivation of the agent task role,decomposes and shapes the corresponding reward function,enriched the feedback density of the reward signal,and superimposes the estimation of the double joint value function.Secondly,the simulation environment of cooperative "pursuit-escape" model was established on the Unity 3D engine.Finally,the experimental results in the simulation scenario show that the new reward design project can accelerate the training process and improve the training results of agents. |