Font Size: a A A

Research Of Multi-robot Pursuit Problem

Posted on:2014-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:T X LanFull Text:PDF
GTID:2268330422951503Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Multi-robot pursuit problem provides an ideal platform for the study ofcoordination and collaboration among the robots. The application of reinforcementlearning algorithm to the pursuit problem can make multi-robot systems becomeinitiative to explore the environment, adapt to the environment and improve its ownperformance and stabiltiy. Since the direct application of standard reinforcementlearning algorithm to multi-robot system always results in the exponential growth ofthe state space, which reduces convegence speed of the algorithm, it attracts less usein practice. This dissertation is devoted to reducing the size of the state spacesystem and improving the convergence speed of algorithm. The main content are asfollows.Firstly, the basic framework of reinforcement learning algorithm with themathematical model is presented and the common reinforcement learningalgorithms and flows are listed. Then we give a description of the multi-robotpursuit problem, state and action abstraction and definition of reward function.Considering the same state existes in the traditional method of state abstraction, wecome up with a dynamic ID state abstraction method, which reduces the size of thestate space and some comparisons with traditional method based on the standardQ-learning algorithm are also presented.Secondly, basic principles of hierarchical reinforcement learning are given.The state space decomposition method is proposed to divide the original state spaceinto multiple parts, that is, the application of OPTION-Q learning algorithm makesoptimal strategy dispersed into each sub-space so that it can reduce the amount ofspace policy to accelerate the convergence speed. The comparison between theOPTION-Q and the standard based on dynamic ID state abstraction methodare made.Thirdly, value function decomposition is developed to improve theOPTION-learning algorithm. The sub-tasks state value function of theOPTION-algorithm is decomposed into two parts, which makes the repeat partcan be invoked repeatedly and improve the convergence speed. A detailedcomparison between the improved OPTION-and the OPTION-based on dynamic ID state abstraction method is presented.
Keywords/Search Tags:Multi-robot pursuit problem, Hierarchical reinforcement learning, OPTION-Q algorithm, Value function decomposition
PDF Full Text Request
Related items