Font Size: a A A

Multi-agent Cooperative Algorithm Research Based On Proximal Policy Optimization

Posted on:2024-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y O ShenFull Text:PDF
GTID:2568307136989539Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In order to cope with the increasingly complex,changeable and uncertain multi-agent task environment,as well as meet the requirements of high dimensional data input and sparse reward,deep reinforcement learning algorithm with strong data processing effect and superior learning ability has become a hot topic in the field of multi-agent cooperation.In this paper,Multi-Agent Proximal Policy Optimization(MAPPO)framework was adopted,using advantage function improvement,noise mechanism and attention mechanism to solve problems such as overfitting and reliability allocation,respectively.And it is verified in the mainstream multi-agent task and competition environment.The main work contents are as follows:1)On the basis of MAPPO framework,in view of the overfitting problem and the premature convergence of variance in PPO,this paper combines the cyclic neural network with PPO algorithm and calculates together with weighted historical data.The optimization problem of continuous action space is supplemented by the basic strategy gradient loss function and the action of negative dominance function is mirrored.The experimental results show that removing the specific input in MAPPO-FP and adding the noise mechanism can also improve the exploration degree of the algorithm and obtain a higher winning rate,and the network layer can be used as the noise layer after parameterization in some tasks.2)In this paper,noise mechanism is introduced to solve the problems of multi-agent system credit assignment and small exploration during late algorithm training.Although the mirror operation can reduce the overfitting phenomenon,the latter increases the instability of the system and only applies to the continuous action space.In this chapter,noise is added to advantage function,value network input and network structure respectively to solve the above problems.The experimental results show that removing the specific input in the original MAPPO and adding the noise mechanism can also achieve better results.3)Attention mechanism is used to solve the problem that the input dimension of multi-agent system is too high and the computational efficiency of online update is low in complex environment in this paper.Cyclic neural network and noise network increase the complexity of input data and increase the number of network layers.When the complexity of the task environment is high and the number of agents is large,the dimensions of network input data will increase rapidly(dimension explosion).This paper proposes to add the attention module before the input of policy network and value network respectively,so that the computing focus of the network is on the key information.The experimental results show that the attention mechanism effectively improves the efficiency of the algorithm and gets better results.
Keywords/Search Tags:multi-agent cooperation, proximal policy optimization, recurrent neural network, noisy network, attention mechanism
PDF Full Text Request
Related items