Multi-agent Cooperative Algorithm Research Based On Proximal Policy Optimization

Posted on:2024-04-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y O Shen

Full Text:PDF

GTID:2568307136989539

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

In order to cope with the increasingly complex,changeable and uncertain multi-agent task environment,as well as meet the requirements of high dimensional data input and sparse reward,deep reinforcement learning algorithm with strong data processing effect and superior learning ability has become a hot topic in the field of multi-agent cooperation.In this paper,Multi-Agent Proximal Policy Optimization(MAPPO)framework was adopted,using advantage function improvement,noise mechanism and attention mechanism to solve problems such as overfitting and reliability allocation,respectively.And it is verified in the mainstream multi-agent task and competition environment.The main work contents are as follows:1)On the basis of MAPPO framework,in view of the overfitting problem and the premature convergence of variance in PPO,this paper combines the cyclic neural network with PPO algorithm and calculates together with weighted historical data.The optimization problem of continuous action space is supplemented by the basic strategy gradient loss function and the action of negative dominance function is mirrored.The experimental results show that removing the specific input in MAPPO-FP and adding the noise mechanism can also improve the exploration degree of the algorithm and obtain a higher winning rate,and the network layer can be used as the noise layer after parameterization in some tasks.2)In this paper,noise mechanism is introduced to solve the problems of multi-agent system credit assignment and small exploration during late algorithm training.Although the mirror operation can reduce the overfitting phenomenon,the latter increases the instability of the system and only applies to the continuous action space.In this chapter,noise is added to advantage function,value network input and network structure respectively to solve the above problems.The experimental results show that removing the specific input in the original MAPPO and adding the noise mechanism can also achieve better results.3)Attention mechanism is used to solve the problem that the input dimension of multi-agent system is too high and the computational efficiency of online update is low in complex environment in this paper.Cyclic neural network and noise network increase the complexity of input data and increase the number of network layers.When the complexity of the task environment is high and the number of agents is large,the dimensions of network input data will increase rapidly(dimension explosion).This paper proposes to add the attention module before the input of policy network and value network respectively,so that the computing focus of the network is on the key information.The experimental results show that the attention mechanism effectively improves the efficiency of the algorithm and gets better results.

Keywords/Search Tags:

multi-agent cooperation, proximal policy optimization, recurrent neural network, noisy network, attention mechanism

PDF Full Text Request

Related items

1	Research On Multi-Agent Collaboration Based On Value Decomposition And Proximal Policy Optimization
2	Research On Multi-agent Coverage Control Based On Deep Reinforcement Learning
3	Research On Cloud-edge Collaborative Network Routing Based On Proximal Policy Optimization
4	Research On Sensor Activity Recognition Based On Improved Deep Recurrent Neural Network
5	Research On Image Description Method Based On Multimodal Recurrent Neural Networks
6	Research On Salient Object Detection Method Based On Attention Recurrent Network
7	Text Classification Research Based On Deep Neural Network And Attention Mechanism
8	Research On Speech Enhancement Method Based On Parallel Optimize Recurrent Neural Network
9	Question Classification Based On Deep Learning Model
10	Research On Urban Traffic Flow Prediction Based On Deep Spatio-temporal Attention Neural Network