Font Size: a A A

Research On Multi-Agent Cooperative Strategy Learning And Training Under Interference Environment

Posted on:2021-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y W ZhangFull Text:PDF
GTID:2518306566491624Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence and automation technology,robot systems emerge a development trend of scale,network and cluster.Compared with a single robot,multiple agents can effectively improve the efficiency of task execution through cooperation behavior.The survivability of the system in confrontation scenario and the adaptability in the complex environment are significantly enhanced.To realize the multi-agent collaborative learning and enhance training strategy,a set of multi-agent collaborative strategy learning method is proposed.In this paper,we can build models,generate behavior strategy trajectories and optimize training methods for homogeneous or heterogeneous systems,and realize dynamic cluster autonomous decision-making.The main research contents and innovations of this paper are as follows.(a)A multi-domain collaborative modeling method based on optimization theory is proposed.In order to achieve a more reasonable and complete scene construction,and considering the simplification of the model and the reuse of scene data information,this paper proposes a modeling method based on isomorphic multi-agent hunting scene.Firstly,the physical model of the agent is simplified to construct the perception and communication model.Then,according to the degree of constraint conditions,the task is divided into four scenarios with progressive difficulty.Finally,by judging the relationship between agents,an algorithm named DFS-DG(Depth First Search Dynamic Grouping)is proposed.Based on this method,the relationship among geography,electromagnetic interference factors and agents is modeled.The purpose is to train the observation information of agents as the input of reinforcement learning algorithm through dynamic grouping processing,so as to learn effective encirclement strategy.And finally,we carry out simulation demonstration.(b)An end-to-end cooperative strategy learning algorithm based on multi-agent reinforcement learning algorithm is proposed.In this part,we propose an end-to-end cooperation model with intrinsic reward learning(E2E-IRL)for heterogeneous multi-agent end-to-end cooperation model.The purpose is to improve the adaptability of the system in dynamic complex scenes and accelerate the convergence speed of training in heterogeneous scenes.Combined with graph neural network and attention mechanism,the algorithm consists of three parts.Firstly,the relationship between heterogeneous agents is learned through graph neural network,and weights are calculated by attention mechanism.Secondly,an adaptive reward function network(the monotonicity mathematical derivation is given)is designed to generate the corresponding reward value according to the environmental state.Finally,combined with the architecture of centralized training with decentralized execution(CTDE),a two-level optimization algorithm is proposed.Experiments and visual analysis are carried out in the standard environment(Star Craft 2 game test environment).(c)Two training optimization methods based on transfer learning are designed.In order to further improve the learning efficiency of reinforcement learning strategy model and reduce the cost of computing time,this paper proposes two scenarios curriculum transfer design methods of scenarios curriculum learning(SCL)based on transfer learning.Firstly,the course design is carried out,and the courses are arranged according to the sequence from easy to difficult in the first part,and the initial task training is completed.Then,the trained model in the previous task is used as input for the next course task,and then continues to be trained until the learning of the final scene is completed.Through the experiment and mathematical derivation of two kinds of curriculum design,it is proved that the method is reasonable and efficient.
Keywords/Search Tags:Multi-domain, Reinforcement learning, Graph Neural Network, Attention Mechanism, Curriculum learning
PDF Full Text Request
Related items