Nowadays,intelligent applications have become indispensable parts of people’s life.However,in many scenarios,a single agent cannot meet the needs of complex tasks,so multi-agent cooperation becomes an important solution.Multi-agent collaboration faces many challenges,such as the need to coordinate the behavior of multiple agents,optimize the decision-making of agents and so on.This paper mainly studies the optimization of multiagent coordination policy based on deep reinforcement learning.The main research contents of this paper are as follows:(1)A hierarchical reinforcement learning coordination approach is proposed for the multi-agent collaborative joint policy optimization problem.In multi-agent collaboration,each agent needs to make appropriate actions based on the environment and the states of other agents to achieve a common goal.However,each agent can only obtain local information,this is difficult to perform global optimization of the entire system.Additionally,the interactions and coupling relationships between agents make the policy optimization problem more complex.For multi-agent collaborative tasks,a hierarchical reinforcement learning algorithm is studied,which divides the coordination problem into two levels.The high-level policy outputs target allocation based on environmental states,while the low-level policy outputs basic actions based on target allocation and observation information.This method considers the global optimization and local optimization problems,coordinates the action selection among agents,realizes the coordinated decision among agents and the overall optimization and stability of the system.Aiming at the policy evaluation and improvement method of multi-agent system,the adaptive adjustment and optimization of the model are studied.A reward regulation mechanism is studied to reduce the impact of reward deviation on policy learning and improve the learning and adaptation capabilities of agents.(2)An experience sharing method based on priority is proposed to address the efficiency problem of multi-agent learning.The study investigates the experience sharing problem among multiple agents to achieve knowledge sharing.An experience pool mechanism is explored to realize information sharing and collaborative learning among agents,which accelerates the overall learning efficiency.By prioritizing experiences,the method enhances the speed and efficiency of learning by applying experiences that contribute to overall performance.This method mainly consists of two steps: priority calculation and experience sampling.In priority calculation,a priority function can be used to calculate the priority of each experience,reflecting its impact on the current policy network parameters.In experience sampling,experiences can be sampled based on their priority,where higher priority experiences have a higher probability of being selected.In this way,the efficiency of learning and application of experiences that contribute to the overall performance can be improved,thus accelerating the convergence of multi-agent reinforcement learning and improving its performance.Finally,the effectiveness of the algorithm in multi-agent collaborative scenarios is validated through experiments. |