Font Size: a A A

Research On Multiagent Reinforcement Learning Algorithm In Continuous Action Space

Posted on:2019-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:G S LiuFull Text:PDF
GTID:2428330626952123Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Many real-world problems,such as urban traffic control,network packet delivery and video games are naturally modeled as multiagent systems.In multiagent systems,an agent often needs to coordinate with other agents.Until now significant efforts have been devoted to multiagent coordination problems.Most of these approaches are extended from Q-learning,such as distributed Q-learning,Policy Hill Climbing(PHC)and recursive Frequency Maximum Q-Value(rFMQ).However,these algorithms can only handle the multiagent coordination problem in discrete action spaces.Nowadays,a number of researches have been devoted to dealing with the single-agent learning problems in continuous action spaces,which can be divided into two major categories.The most common algorithms are based on function approximation technology,which can be classified into two categories: value approximation algorithms and policy approximation algorithms.The others are based on the Monte Carlo sampling method.All these algorithms are designed for singleagent environment and cannot be applied in multiagent systems directly.In this thesis,we propose a framework to solve efficiently solve the problem of multiagent coordination in continuous action space.According to this framework,we propose a novel algorithm called Continuous Action Learning Automata with recursive Frequency Maximum Q-Value(CALA-rFMQ)that leverages the advantage of existing multiagent discrete-action learning algorithms and single agent continuous action algorithms.The CALA rFMQ agent first samples uniformly in the continuous action space to get several discrete actions.Secondly,each agent selects the expected actions from these sampled actions and divides the continuous action space into a lot of sub-continuous action spaces.In this step,we extend the idea of the rFMQ and propose the Win or Learn Slow Policy Hill Climbing(WoLS-PHC)to evaluate each sampled actions based on the expected rewards.Finally,CALA-rFMQ transfers the prior knowledge from the expected actions of previous step to continuous action spaces as the initialization.In this step,we extend CALA with Win or Learn Slow(WoLS)principle to explore the final optimal actions from the sub-continuous action spaces,which improve the efficiency of exploration of CALA.The experiments are mainly designed in single-state continuous action version climbing games and multiple state cooperative the boat game.Experimental results show that our algorithm can quickly converge to global optimum and outperform the previous work...
Keywords/Search Tags:Multiagent systems, Continuous action space, Reinforcement learning, Coordination game
PDF Full Text Request
Related items