Font Size: a A A

Research On Multi-Agent Deep Reinforcement Learning Methods And Applications

Posted on:2019-07-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:1368330572450129Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Under the era of big data,the most urgent and significant problem is how to obtain the most valuable information from massive data.This difficulty comes from two aspects.One is the complication of dealing with massive data,and the other is that we are facing a multi-agent system,where diversity exists among each individual,so that the most valuable information is uncertain.Therefore,we study multi-agent deep reinforcement learning method to discover users' patterns.Then,the most valuable information for each individual can be learned,maximizing their quality of experience.Furthermore,users' patterns can be applied to a verity of applications such as custom recommender systems,automatic control,dynamic resource allocation and smart navigation.Deep learning is capable of extracting features from highly complicated data,allowing computers to sense abstract concept,so that it is an efficient tool to mine from massive data.Meanwhile,reinforcement learning methods allow agents to learning custom behaviors via rewards.As a combination,deep reinforcement learning can learn optimal policy directly from raw data with extracting features by deep neural networks.Multi-agent deep reinforcement learning,however,are facing more challenges.Under the circumstances of multi-agent systems,agents have to take environments and behaviors of other agents into consideration.Meanwhile,due to the diversity of individuals,a unified description of rewards does not exist.Therefore,users' patterns are required to describe this diversity,so that optimal custom rewards for each agent can be achieved.This dissertation provides researches from the following aspects.Firstly,a centralized multi-agent deep reinforcement learning is proposed to satisfy custom data mining under big data environment.Currently,most data mining methods are indiscriminative,so that the most desire data cannot be mined according to users' patterns.Therefore,we consider applying multi-agent deep reinforcement learning to allow each agent to extract knowledge from raw data.This,however,is inefficient and impractical,because of the high computational cost from deep learning.As a result,we apply centralized multi-agent deep reinforcement learning framework,where extraction is occurred with deep learning methods and users are applying multi-agent reinforcement learning methods to learn custom patterns,so that the QoE of users can be maximized.Secondly,a generative model of users' patterns is discussed in multi-agent systems,so that the diversity of agents can be defined.Traditional reinforcement learning is actually a discriminative model.Our generative model of users' patterns is described as an intractable distribution over the action space of agents.Agents sample from this distribution and apply this estimated pattern to guide their behaviors.The experimental results suggest that the estimated pattern is similar to the true one after convergence.Thirdly,social behavior study among homogeneous agents based on reinforcement learning is discussed via pervasive social networking.Under this multi-agent system,a predefined relationship of cooperation and competition seems unreasonable.The roles of agents should be defined by their patterns.Therefore,we firstly define suitable users' patterns targeting this environment.Then we propose QLA and VLA methods to learn their patterns.After determining users' roles,we focus on competitive agents.Finally,we apply typical games,namely prisoners' dilemma and Cournot model in multi-agent version,in order to study social behaviors,as instances.Fourthly,we propose dynamic resource allocation methods under a system with inhomogeneous agents via cognitive radio networks.Since they have different strategies,we have to treat them separately.The most typical instance of inhomogeneous agents is primary users and secondary users in telecommunication systems.Here we only concern the condition of one primary user and multiple secondary users,where resources are control by primary user and waiting for being distributed to secondary users.We model this environment as a monopoly market,after analysis.Then we propose TPQ method and bidirectional reinforcement learning method for secondary users and primary users to learn their strategies,respectively,leading to a dynamic equilibrium during they maximizing their rewards.Finally,we discuss applications under the circumstance of the internet of vehicles since the internet of vehicles is a complicated multi-agent environment.We have proposed a concept of social vehicle swarms,as treating vehicles as basic unit in the swarms.Meanwhile,we apply agent-based modelling methods and deep reinforcement learning methods to mine custom data,so that each agent can obtain their desire data and maximize their profit.Meanwhile,we are considering the cooperation of multiple sensors from agents for a public application.Specifically,we propose vehicle tracking based on data fusion,along with custom vehicle tracking based on reinforcement learning and faster R-CNN,which improves the flexibility of tracking.
Keywords/Search Tags:artificial intelligence, multi-agent systems, reinforcement learning, deep learning, users' patterns, generative model
PDF Full Text Request
Related items