As one of the most important manifestations of artificial intelligence,reinforcement learning has unique advantages and correspondingly essantial research value when facing sequential decision-making problems.In recent years,multi-agent reinforcement learning methods develope rapidly,which can better meet the needs of real-world scenarios in the multi-agent field.It follows the basic framework of reinforcement learning.Nevertheless,real-world scenarios often have a large number of possible states that requires a high cost to fully explore the environment.In the training phase,multiple agents make decisions in the scene simultaneously,the strategies affected by others cause the environment instable.There are still deficiencies in actual applications of complex state space scenarios:first,the state space is complex,and the data dimension representing the state is large.Agents need to interact with the environment multiple times in an episode,which obtain a large number of training samples.In limited iterations,agents could only explore part of the space.The strategy is easy to fall into sub-optimal.Thus,algorithms cannot adapt to complex state space scenarios,and the training efficiency decreases.At the same time,the algorithm uses neural networks training strategy.When the number of agents is expanded to meet the needs of complex state space scenarios,the amount of network parameters increases significantly,and then algorithms are difficult to converge,which limits the scalability of multi-agent algorithms in such scenarios.Based on the multi-agent collaboration problem in complex state space scenarios,this paper conducts research on the problem of reduced efficiency and limited scalability of multi-agent reinforcement learning methods.We utilize state space optimization,network parameter sharing and other methods to improve the ability of reinforcement learning methods in order to cope with complex state-space scenarios.The specific research work is as follows:(1)Designing an improved Actor-Critic multi-agent cooperative strategy optimization algorithm.We analyze that the complex state space scene has the characteristics of large state space.The number of subtasks to be completed by the agent is more than that of the classic scene.In this case,the agent cannot fully explore a large number of state spaces,and the problem of reduced training efficiency occurs.In response to this issue,we proposes a multi-agent reinforcement learning training optimization method based on small-scale training and large-scale execution.Relying on a typical complex state space collaborative scene of group foraging,the training method is optimized on the basis of the Actor-Critic method.We reduce the repeated state space,which allows agents to explore more valuable states in fewer iterations.While reducing the pressure of algorithm training,it could improve its algorithm efficiency in time-limited tasks.(2)Designing a policy parameter sharing method to improve the scalability of multiagent systems.The multi-agent deep reinforcement learning algorithm under the centralized training and decentralized execution paradigm,use deep neural network to learn strategy in training process.The amount of model parameters to be updated added with the expansion of agents’ number,which increases the difficulty of training algorithm.We aims to alleviate this problem by introducing a policy parameter sharing mechanism.It helps the algorithm to maintain the number of policy models needing updated,and reduce the total amount of parameters updated by networks during training.This method also allows the effective historical experience of an agent to be passed on to the other agents.While the learning speed of the agent is balanced,it could also maintain the uniqueness of its own strategy,thereby generating a more efficient training model.In this paper,agent strategies are synchronized by soft updating,which encourages the exploratory behavior of agents in the early stage of training.It improves the scalability of multi-agent systems.The method achieves better results in the process of increasing the number of agents.Based on the above work plan,this paper mainly relies on group foraging,a typical complex state space scenario,to conduct experiments,evaluate and verify the effectiveness of the proposed methods,improving the performance of multi-agent systems in complex state space scenarios. |