Font Size: a A A

Research On Multi-Agent Decision-Making Based On Q-Learning In RoboCup Rescue Simulation

Posted on:2019-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhouFull Text:PDF
GTID:2428330566496027Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Multi-gent system(MAS)has always been a hot topic in artificial intelligence research.In order to solve the two problems that the multi-agent can not make effective decision in the weak communication environment and the "dimension disaster" which the multi-agent encountered in the huge disaster environment,thus improving the learning and decision ability of the agent,this paper proposes the following three aspects:(1)the weak communication state based on the dynamic fuzzy decision tree.The optimization of decision making for the agent is below;(2)single agent Q learning optimization based on support vector machines(SVM);(3)multi-agent Q learning optimization based on information interaction and reliability allocation.The work and innovation of this article are the following points:(1)intelligent decision optimization in the weak communication state based on dynamic fuzzy decision tree: this method simplifies the various conditions required by the intelligent agent decision into several important conditions,and then discretize the fuzzy information caused by the communication quality difference,complement the missing information and construct the dynamic fuzzy decision tree,and overfit it.The branch is pruned to obtain the dynamic fuzzy decision tree of intelligent decision making.In order to solve the problem that intelligent communication can not make correct decisions due to fuzzy and missing communication information under weak communication conditions.(2)the single agent Q learning optimization based on Support Vector Machine: this method constructs the SVM support vector machine,which constructs the function curve of the current state and the joint value and the current action,so that the current Q value can be obtained directly by entering the current action and the current state pair of state pairs,avoiding the excessive state space.Complex,the Q value query table cannot build a problem.In addition,using time window mechanism,the SVM support vector machine is rolled online with time,which ensures that SVM can realize dynamic online learning,and the Q value obtained by KKT condition detection ensures that SVM support vector machine can always roll in the more accurate direction.(3)multi agent Q learning optimization based on empirical interaction and reliability allocation:this method proposes a method of updating a common Q table from multiple agents to realize Q learning of multi-agent.According to human interaction experience,it is proposed that earlier agents get more experience from their own Q tables,more and more as time goes on.Experience is obtained from a common Q table.And construct the structure reliability function and time reliability function according to the specific environment,and distribute the whole return signal in the way of reliability allocation to the agent according to the different contributions.All three methods of constructing isomorphic agents are applied on the RoboCup Rescue Simulation System(RCRSS)platform of RoboCup rescue simulation system,and good results have been achieved.
Keywords/Search Tags:RoboCup Rescue Simulation, Communication-limited, Dynamic fuzzy decision tree, SVM-Q model, Experience exchange, Reliability distribution function
PDF Full Text Request
Related items