Font Size: a A A

Agent-based Research On Learning Methods In Multi-robot System

Posted on:2017-08-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:1318330536481031Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Compared to single robot,multi-robot system has many advantages and good prospects of development,which has been hotspot in the study field of robot.Multi-robot system(MRS)is also a complicated and dynamic system.So it is impossible to set optimal behaviors for designers in advance in design of control policy of intelligent robot.Behavior based method can make multi-robot system have some intelligence and can more complex task,which greatly promote the development of multi-robot system.But only behavior based method can 't totoally meet the requiremnet of changing external environment and different tasks.It is an important diection of development that making multi-robot system have the ability of active learning and avoiding the limitation of single learning method.So combing different kinds of learning method with behavior-based method has important research significance.In this thesis,Agent theory is adopted for studying multi-robot system,and the main contents are as follows:Firstly,Agent and Multi-agent sysytem theory is introduced,and several architectures of single robot and MRS is studied and analyzed.Combining behavior-based method with learning-based method as architecture of MRS is proposed to explore the coordination of multi-robot system,behavior based method is adopted to design robot formation and robot soccer.Learning ability plays a significant role of a lot of research field of MRS.Compared to other method,behavior-based method have strong robustness and control nimbly and can accomplish task better.In order to combine different machine learning method,this paper aims at two main test platform of multi-robot system: robot formation and robot soccer,and adopts behavior based method to design multi-robot system by robot simulation software Mission Lab and Teambots,by which several algorithms proposed in this paper can be validated.Secondly,Particle Swarm Optimization(PSO)and Case-Based Reason(CBR)is studied and the advantages of two algorithms is summarized,then a novel hybrid system approach combining PSO and CBR is proposed.Although traditional behavior-based method has many merits,it is complex to set control parameters for different behavior.CBR as an important branch of artificial intelligence field has advantage of easy retrieval and storage,so it is fit for providing corresponding control parameters for different behavior.But traditional CBR lacks efficient learning ability,so PSO can be optimizer of CBR in order to help CBR to obtain better cases.Compared to Genetic Algorthm(GA),PSO also belongs to Swarm Intelligence algorithm,but PSO has simpler structure,stronger realtime capability and is more suited for continuous problems.If GA can solve a problem,then PSO can be just as good.So CBR combined with PSO not only overcomes disadvantages of CBR,but also meets the reqirement of realtime capability and continuous problem.Compared with standard PSO,the validity of this method has been proved by robot formation.Thirdly,basic theory of reinforcement learning and classic Q-learning is investigated,this dissertation proposes an improved Q-learning algorithm with experience sharing and filter technique for overcoming the disadvantages of traditional Q-learning to improve the learning performance and efficiency.The foundamental theory of Q-learning is markov decision process,although directly applying Q-learning in MRS violates this precondition,Q-learning has simple operation and small size of state-action space and is applied in real MRS.Compared with multiagent reinforcement learning algorithm,traditional Q-learning lacks communication with other agents,so this dissersation proposes the way of experience sharing.Each agent shares with other agents' Q values through a gradual learning process using ?-Greedy policy to get learning experience with probability 1-?.Meanwhile,received signal is regarded as the sum of real reward signal and nosie signal and kalman filter technique is adopted for solving structural credit assignment of reward signal to accelerate convergence replacing uniform distribution that all agents have the same reward signal.Compared with traditional Q-learning algorithm,the validity of this method has been proved by robot soccer.Finally,typical multiagent reinforcement learning algorithms Minimax-Q,Nash-Q,FFQ,CE-Q and regret theory based learning method is investigated,a novel CE-Q with no regret strategy is proposed to overcome the disadvantage that low convergent rate of CE-Q.Markov game provides the fundamental background required for multiagent reinforcement learning.Nash e quilibrium plays an important role in these typical multiagent reinforcement algorithms: Minimax-Q,Nash-Q,FFQ and CE-Q,so these algorithms are also called equilibrium based learning algorithm.Compared with calculating nash equilibrium in Nash-Q learning,calculating correlated equilibrium is simpler,so CE-Q has a better prospect of application.But traditional CE-Q lacks efficient exploration policy that influences the convergent rate.Inspired by no regret strategy,if each agent adopts the method of minimizing average regret as exploration policy,then the joint action of all agents will converge to a set of no regret which is called set of coarse correlated equilibrium.This dissertation proposes a novel algorithm CE-Q with average regret-minimization to accelerate convergent rate.Compared with traditional CE-Q algorithm,the validity of this method has been proved by robot soccer.
Keywords/Search Tags:Agent, Multirobot system, Machine learing, Behavior-based method, Reinforcement learning, Regret theory
PDF Full Text Request
Related items