Research Of Multi-robot Pursuit Problem

Posted on:2014-04-02

Degree:Master

Type:Thesis

Country:China

Candidate:T X Lan

Full Text:PDF

GTID:2268330422951503

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Multi-robot pursuit problem provides an ideal platform for the study ofcoordination and collaboration among the robots. The application of reinforcementlearning algorithm to the pursuit problem can make multi-robot systems becomeinitiative to explore the environment, adapt to the environment and improve its ownperformance and stabiltiy. Since the direct application of standard reinforcementlearning algorithm to multi-robot system always results in the exponential growth ofthe state space, which reduces convegence speed of the algorithm, it attracts less usein practice. This dissertation is devoted to reducing the size of the state spacesystem and improving the convergence speed of algorithm. The main content are asfollows.Firstly, the basic framework of reinforcement learning algorithm with themathematical model is presented and the common reinforcement learningalgorithms and flows are listed. Then we give a description of the multi-robotpursuit problem, state and action abstraction and definition of reward function.Considering the same state existes in the traditional method of state abstraction, wecome up with a dynamic ID state abstraction method, which reduces the size of thestate space and some comparisons with traditional method based on the standardQ-learning algorithm are also presented.Secondly, basic principles of hierarchical reinforcement learning are given.The state space decomposition method is proposed to divide the original state spaceinto multiple parts, that is, the application of OPTION-Q learning algorithm makesoptimal strategy dispersed into each sub-space so that it can reduce the amount ofspace policy to accelerate the convergence speed. The comparison between theOPTION-Q and the standard based on dynamic ID state abstraction methodare made.Thirdly, value function decomposition is developed to improve theOPTION-learning algorithm. The sub-tasks state value function of theOPTION-algorithm is decomposed into two parts, which makes the repeat partcan be invoked repeatedly and improve the convergence speed. A detailedcomparison between the improved OPTION-and the OPTION-based on dynamic ID state abstraction method is presented.

Keywords/Search Tags:

Multi-robot pursuit problem, Hierarchical reinforcement learning, OPTION-Q algorithm, Value function decomposition

PDF Full Text Request

Related items

1	Research On The Pursuit-Evasion Problem With Multi-robot Based On Reinforcement Learning
2	Research Of Multi-Robot Pursuit Problem Based On Reinforcement Learning
3	Continuous Time Hierarchical Reinforcement Learning Algorithm
4	Research On An Approach Of Hierarchical Reinforcement Learning Based On Option Automatic Generation
5	Research On Pursuit-Evasion Strategy Of Multi-Robot Based On Reinforcement Learning
6	Researches On Hierarchical Reinforcement Learning Based On Abstract Actions
7	Research On The Pursuit Of Large-scale Emotional Robot
8	Research On The Sparse Reward Problem Based On Hierarchical Reinforcement Learning
9	Hierarchical Reinforcement Learning And Its Application To Obstacle Avoidance Problem For Manipulator
10	The Decomposition And Reconstruction Of Complex Environment In Reinforcement Learning