Font Size: a A A

Research On Multiagent Cooperation And Applications Based On Reinforcement Learning

Posted on:2022-09-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:S X LiFull Text:PDF
GTID:1488306731997819Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of artificial intelligence technologies,intelligent equipment and systems are widely used in our daily life.The research and breakthrough of multi-agent system can bring us a better life with more and more things done by intelligent agents.Multi-agent system has the characteristics of autonomy,distribution and coordination.Furthermore,it has the ability of self-organization,learning,and reasoning.Compared with a single agent,the research of multiagent systems is mainly faced with challenges of group scale,partial observation,failure of control information transmission,and environmental uncertainty.This dissertation analyzes the multiagent cooperation problem thoroughly and meticulously,and investigates the efficiency of learning,communication mechanism,behavior consensus and efficient knowledge transfer of multi-agent,by using deep reinforcement learning.For the sake of evaluating the proposed algorithms,this dissertation developed a new multi-agent reinforcement learning based passive location environment.This dissertation is expected to provide theoretical and application support for the realization of more intelligent multi-agent system.Its main contributions and innovations are summarized as follows:1.Aiming at the low sample efficiency of reinforcement learning in single-agent scenarios,a trajectory based prioritized double experience buffer for sample-efficient policy optimization algorithm is proposed.In the consideration of different transitions are in different learning values,in an episodic reinforcement learning task,the importance of a state transition for learning is evaluated over the whole trajectory of the episode,and different transitions are assigned different weight in sampling the training data,which can improve the sample learning efficiency by endowing higher sampling priorities for more “important” samples.At the same time,aiming at the time limitation of interactions between agents and environment,an asynchronous parallel sample collection method is introduced,which decouples agents’ behavior and learning to alleviate the bottleneck effect of individual interaction time on overall learning time.Also,a soft update strategy is leveraged to stabilize the training process.The experimental results in robots control tasks on Mu Jo Co show that,the proposed algorithm has high sample efficiency and stable training process than similar existing algorithms.2.Aiming at the credit assignment problem among agents in multi-agent cooperation tasks,a multi-agent cooperation algorithm based on global critical decomposition is proposed.Assuming that multiple agents share a same global scalar reward in the same environment,a reasonable credit assignment solution for agents is vital to effective cooperation.The method based on value function decomposition can implicitly solve the credit assignment problem among agents by using gradient backpropagation.Under the framework of global critic function decomposition,a new counterfactual value function estimation method is proposed to improve the accuracy of contribution evaluation among agents.3.Aiming at challenges brought by the partially observable environment,a multi-agent communication mechanism based on mutual information constraint is proposed.Under the condition of partial observability,each agent can only access to local information,hence the assumption of stationary Markov decision process is no longer satisfied.The communication among agents enables agents to obtain more information about the environment by integrating other agents’ observations and actions,resulting in better decision-making.But it is necessary to solve new problems brought by communication,such as when,to whom and what.This dissertation designs an adaptive communication module based on mutual information constraint,which can reduce the communication overhead between agents without sacrificing efficiency.In order to verify the effectiveness of the proposed method,a multi-agent reinforcement learning based passive location environment is developed.In the new domain,multiple agents need to adjust their positions cooperatively to find optimal geometries to improve passive positioning accuracy.The empirical results show that the communication based agents can reach effective cooperation strategies for completing passive location tasks,and the communication overhead between agents is well controlled.4.Aiming at the scalability and robustness of multi-agent systems,a fully distributed multiagent policy gradient algorithm based on message diffusion is proposed.As the number of agents grows,centralized algorithms may fail in collecting the observations and transmitting control information.Also,a multi-agent system that relies on a central node is in low reliability.This dissertation assumes that there is a communication network where all agents are placed in it,and agents can exchange information with their neighbors.Under this setting,information obtained by each agent is disseminated in a fashion of diffusion.Then,agents integrate their own observations and information obtained from neighbors for better decision-making.Both in training and execution,each agent act in a completely centerless way,which benefits the scalability and reliability of the system.In this dissertation,the convergence of the proposed algorithm is analyzed with the stochastic approximation theory.Under several reasonable assumptions and linear function approximations,the convergence of the proposed algorithm is proved.Furthermore,when the value function and policy are constructed by artificial neural network,the algorithm also converges to an effective policy.5.Aiming at the knowledge transfer efficiency of multi-agent reinforcement learning,a multiagent policy reuse algorithm based on self-attention mechanism is proposed.Reinforcement learning requires a large number of samples generated by interacting with the environment.High time and economic cost may greatly limit the application of reinforcement learning.To this end,this dissertation aims to use the knowledge that agents acquired in existing tasks to accelerate the learning of a similar target task.Assuming that agents have abilities to complete tasks that are related to the target one,and these capabilities are stored in agents’ policy functions,or source policies.To reuse the knowledge of source policies,a novel policy embedding method based on state distribution is proposed.Furthermore,when facing new similar tasks,the agent integrates existing policies through self-attention mechanism for comprehensive decision-making.Empirical results show that the proposed multi-policy reuse method can improve the start-up performance in training,and surpasses the final convergence performance that without policy reuse.
Keywords/Search Tags:Multiagent System, Reinforcement Learning, Deep Reinforcement Learning, Value Function, Policy Function, Actor-Critic, Policy Gradient, Stochastic Approximation, Knowledge Transfer, Policy Reuse, Self-Attention Mechanism
PDF Full Text Request
Related items