Research On Multi-agent Deep Reinforcement Learning In Non-globally Knowable Environment

Posted on:2023-07-06

Degree:Master

Type:Thesis

Country:China

Candidate:R Zang

Full Text:PDF

GTID:2568306821995979

Subject:Data Science and Technology

Abstract/Summary:

PDF Full Text Request

By interacting with the environment,agents utilize reinforcement learning to optimize policies in order to maximize rewards or complete specific tasks.Combining reinforcement learning and deep learning to form deep reinforcement learning,it not only has powerful feature extraction ability and expression ability to perceive agent attribute information and environmental information,but also has strong exploration ability to adapt to the dynamic changes of complex environment,showing good performance on multiple complex problems.Especially in multi-agent collaborative decision-making tasks,multi-agent deep reinforcement learning has become a research hotspot,and has been generally used in various fields such as UAV formation coordination,transportation hub control,and intelligent logistics.Therefore,multi-agent deep reinforcement learning has important value both in theoretical research and practical application.In practical application systems,a single agent usually only has the local observation ability,that is,the non-globally knowable environment multi-agent system.When completing tasks with high collaboration requirements,the close cooperation between agents can maximize the interests of the team.However,under the condition of non-globally knowable environment,each agent has limited cognition of complex environments and requires communication for coordination.Therefore,how to enhance the ability of agents to perceive the environment through the effective communication between agents and effectively improve the quality of decision-making is an important content of multi-agent system research.Based on this,this paper studied the communication strategy learning of agents in the process of multi-agent cooperative decision-making in a non-globally knowable environment,and proposed two multiagent reinforcement learning methods for the effective identification and processing of messages in the communication process and the optimization of communication resources.The specific research contents include:(1)Aiming at the problem of message redundancy and noise in the communication process,this paper proposed a multi-agent reinforcement learning method based on attentional message sharing which called AMSAC.Specifically,first,based on the multi-agent actor-critic architecture,the agent message sharing space is built.The agent reads and writes messages to the shared space,constructs global environment information,and solve the problem of lack of communication between agents in non-globally knowable and complex tasks;Secondly,an attention mechanism is established in the message sharing space to identify important information and process it to improve the message processing performance of the multi-agent system;Finally,in the centralized critic network,the global state and action information is fully utilized,and the temporal difference dominance policy gradient is used to reasonably evaluate the value of the agent’s actions.Experiments are carried in a multi-agent cooperative confrontation environment,and the results show that AMSAC outperforms the other baselines in four different scenarios.(2)In view of the good performance of the multi-agent value function decomposition method in solving non-stationarity and scalability problems,facing its inconsistency in the decentralized execution process,this paper proposes a multi-agent reinforcement learning method based on information theory optimization which called BESQ.Based on the multiagent value function decomposition architecture,BESQ designs two kinds of communication message regularization optimizers based on the information theory optimization technology,and then constructs the communication resource optimization mechanism between agents,which solving the problem that the value function decomposition method lacks coordination in the decentralized execution process.Specifically,first,in order to enhance the expression of the agent’s communication message,a regularization optimizer that maximizes the mutual information entropy between the agent’s message and action selection is established,which reducing the uncertainty of other agents’ action value functions;at the same time,in order to optimize the succinctness of agents’ communication messages,a regularization optimizer that minimizes agents’ message entropy is established,so that the messages communicated by agents contain the most important information for decision-making.Finally,based on the multiagent value function decomposition method Qatten,BESQ realize the above communication resource optimization mechanism,which make the value function decomposition and the communication learning method combine organically.Experiments are carried in a multi-agent cooperative confrontation environment,and the results show that BESQ outperforms the other baselines in four different scenarios.

Keywords/Search Tags:

multi-agent system, deep reinforcement learning, policy gradient, value function decomposition, attention mechanism, information theory optimization

PDF Full Text Request

Related items

1	Research On Multi-agent Distributed Cooperation Method Based On Deep Reinforcement Learning
2	Research On Multi-Agent Collaboration Based On Value Decomposition And Proximal Policy Optimization
3	Deep Reinforcement Learning Based On Policy Gradient Optimization And Its Application In Agent Control
4	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
5	Research On Multi-agent Attack And Defense Countermeasures Based On Deep Reinforcement Learning
6	Research Of Deep Reinforcement Learning In Real-Time Strategy Games
7	A Study Of Multi-agent Reinforcement Learning Based On Weighted Q-value Decomposition
8	Research On Agent Decision-making And Control Based On Deep Reinforcement Learning
9	Research On Multi-agent Coverage Control Based On Deep Reinforcement Learning
10	Research On Goal-Conditioned Hierarchical Multi-Agent Reinforcement Learning For Cooperative Environment