Multi-Agent system is a branch of distributed artificial intelligence and has wide applications in the field of collaboration of unmanned system,resource management,formation control,etc.In recent years,deep reinforcement learning has been applied into multi-agent system because of its advantage in dealing with the open environment and transferability of knowledge.However,multi-agent deep reinforcement learning has the following problems:(1)the obsolete experience problem in training process.The training of deep reinforcement learning relies on the history data.However,in multi-agent scenarios,the policies of all the agents are simultaneously evolved.So,the history data in the experience replay memory of one agent can not reflect the current policies of all the other agents,as a result,they become obsolete.The obsolete experience problem fluctuates the training process and slows the convergence of training.(2)the open environment problem of the running system.Multi-agent system must be an open system,there are old agents exiting and new agents entering the system.There are few works about how to guide the new agents to learn the collaborative strategy fast.In response to the above challenges,this project carries out the following research:use multiple workers to sample training data from the parallel worlds to avoid the obsolete experience problem;integrate experience replay memory into the above method to overcome its low data utilization problem;use the advising method to help new agents accelerate their learning speed.Specifically,this project carries the following three works:(1)Propose the multi-agent synchronous reinforcement learning based on parallel worlds.To avoid the obsolete experience problem,this project uses multiple workers to sample training data from the parallel worlds and further incorporates the n-step method to accelerate the propagation of reward.This method does not reuse training data,so avoids the obsolete experience problem.(2)Propose the multi-agent synchronous reinforcement learning based on parallel worlds and proper use of history experience.In multi-agent system,recent history experiences can reflect the dynamics of the current system to a certain extent,which is helpful to the training.Therefore,this project integrates the parallel-world-based multi-agent synchronous reinforcement learning with experience replay memory to make a balance between reducing the influence of obsolete experience and improving data utilization,which is helpful to reduce the training time.(3)Propose the advising-based fast collaborative strategy generation method.When new agents come into the system,agents that already have strategies can guide new agents to learn their policies fast.Based on this idea,this project gives a method to generate collaborative suggestions and proves the optimality of the method.Then,propose the advising-based hard-suggestion-accept distribution policy generation method and the advising-based soft-suggestion-accept distribution policy generation method.The hard suggestion accept method refers to that the new agents accept suggestion in the early stage of training and execute suggestion immediately after the suggestion is received,the soft suggestion accept method refers to that the new agents accept suggestion in the whole stage of training and execute suggestion based on some distribution,which is more suitable for the problem that has multiple optimal solutions.This project selects the typical problem,the coordinated multi-agent object transportation problem,as the experiment environment,and tests the above methods.The experimental results show that the above methods can effectively reduce the training iterations,reduce the training time,and speed up the learning of strategies of new agents compared with existing methods. |