Font Size: a A A

Research Of Multi-agent Cooperation Based On Deep Reinforcement Learning

Posted on:2022-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z R HuangFull Text:PDF
GTID:2518306542979449Subject:Data Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of science and technology in recent years,deep learning and reinforcement learning have become more and more popular.Deep reinforcement learning has gradually become a hot technology in many fields.As an important research in multi-agent fields,deep reinforcement learning uses its powerful feature extraction capabilities to extract the surrounding information and environmental information of each agent,and uses its powerful perception and exploration capabilities to perceive and adapt dynamic environments,so as to make the best decision.In multi-agent system,a problem that has been widely concerned by many scholars is multi-agent cooperation based on deep reinforcement learning,which aims to study how agents could achieve optimal overall performance through effective collaboration control in a complex and changeable environment.At the same time,this intelligent collaboration technology has also been applied in more and more fields and has become increasingly mature,such as the combat of unmanned aerial vehicle,drone exploration and the formation of aircraft.Therefore,no matter from the theoretical value or practical application,the research of multi-agent cooperation based on deep reinforcement learning has strong research significance.In multi-agent system,how the agent effectively balances the individual interests and the team interests is very important.In addition,how to effectively use historical information to improve the utilization rate of experience is also a key challenge to multi-agent cooperation.This thesis conducts in-depth research on the above issues,and mainly completes the following tasks combined with the existing multi-agent cooperative algorithm based on deep reinforcement learning:(1)Aiming at the problem of balancing individual interests and team interests,this thesis proposes an interest-driven hybrid action value function algorithm(Hy CMA-Q)to solve this challenge in multi-agent cooperative learning.This algorithm use the ?,which is the degree of individual agents integrating into the team,to adaptively adjust the proportion of the joint action value function and the individual action value function to balance the interests of the team and the individual agent.The algorithm is suitable for environments with continuous and discrete state or action spaces.Experiments show that this algorithm has better performance in cooperative,competitive and mixed environments.(2)Aiming at the problem of historical experience reuse,this thesis proposes a multi-agent cooperation algorithm based on historical experience(Hy CMA-H).This algorithm modifies the experience replay buffer used for data sampling and introduces historical information into it to help the agent make better decisions,such as the state and actions at the last moment.Our experiments have proved that historical information could speed up the convergence of the model in the early stage of training.At the same time,compared with the baseline models,this algorithm enables the team to obtain greater benefits.(3)In order to build a good experience replay buffer for multi-agent deep reinforcement learning,this thesis proposes a prioritized experience replay algorithm for cooperative multi-agent learning(PEMAC).This algorithm introduces the idea of prioritized experience replay.During the training stage,the algorithm marks the importance of experience based on the proportional prioritization calculated by the TD error,then uses the higher priority experience data to update the network.Experimental results show that the algorithm in this thesis improves the quality of training data,thereby improving the speed of model convergence and learning efficiency.And the performance of the algorithm in the cooperative treasure hunt and rover-tower environments is better than the baseline algorithm.
Keywords/Search Tags:hybrid action-value function, deep reinforcement learning, multi-agent cooperation, team interests, individual interests, historical information, prioritized experience replay
PDF Full Text Request
Related items