| With its "human like" learning mechanism,deep reinforcement learning algorithm has a good development prospect in the field of autonomous decision-making,and is widely used in the direction of intelligent military decision-making.However,in the field of intelligent military decision-making,simple discrete decision-making control is mostly used,which can not adapt to the high-dimensional and continuous complex military environment,and the algorithm is difficult to converge under multi-agent decision-making,and the ability of cooperation and communication among agents is weak.To solve the above problems,this paper studies from two aspects: enhancing the convergence speed and stability of multi-agent algorithm and improving the ability of cooperation and communication among agents.The main research contents and contributions of this paper are as follows:(1)This paper breaks through the inherent thinking that discrete reinforcement learning algorithm is mostly used in the field of intelligent military decision-making,and puts forward the application of continuous deep reinforcement learning algorithm to solve the problems of rasterization of military decision-making environment,discretization of decision actions and inability to deal with high-dimensional state space under discrete algorithm.DDPG,a typical algorithm in continuous deep reinforcement learning algorithm,is selected as the benchmark algorithm for subsequent research;(2)To solve the problem that DDPG cannot be directly applied in multi-agent environment,an improved Multi-Agent Reinforcement Learning Algorithm SD-DDPG based on DDPG is presented.By introducing the method of empirical value first,an exploratory strategy based on double noise is designed to achieve complex and continuous military decision-making behavior.At the same time,a multi-agent decision framework with single training mode control is constructed to optimize the training efficiency of the optimization algorithm.The experimental results show that the SD-DDPG algorithm can achieve continuous decision control in multi-agent environment with good convergence speed and stability;(3)To solve the problem of weak communication and collaboration between agents in multi-agent environment,an improved Multi-Agent Reinforcement Learning Method PMADDPG based on MADDPG is presented.Combining the priority experience replay with the original algorithm improves the convergence speed and stability of the algorithm,and two different reward functions are designed to guide the agent to complete the operational tasks efficiently.The experimental results show that,compared with SD-DDPG algorithm,each agent of P-MADDPG algorithm has stronger ability of communication and collaboration,and has higher turn reward and better algorithm stability than MADDPG benchmark algorithm. |