Font Size: a A A

Research On Multi-Agent Reinforcement Learning Under Sparse Reward Scenario

Posted on:2021-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:S N ChenFull Text:PDF
GTID:2518306122477934Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The setting of reward functions greatly impacts the solution quality of reinforcement learning tasks.For tasks such as new environment exploration,the reward functions are usually sparse,i.e.they return high-valued rewards for only a few states but provide no reward guidance to the agent in most cases.The reward sparsity results in low learning efficiency for such tasks.While there are a few works on tackling reward sparsity for single agent learning scenarios,the problem is much more difficult in multi-agent reinforcement learning scenarios because 1)the rewards become extremely sparse when the number of agents increases and 2)the information from multiple agents needs to be jointly considered to alleviate environment nonstationarity,which may result in dimensional disaster.It is of significance for better applying multi-agent reinforcement learning to solving real-world problems to study methods that can apply multi-agent reinforcement learning to sparse reward tasks,and improve the training speed and scalability of the algorithm.In this paper,we focus on the problem of sparse reward in multi-agent scenarios.The main contributions are as follows:(1)A multi-agent backtracking based deep deterministic policy gradient algorithm is proposed to solve the problem of low sample efficiency caused by sparse rewards.The key idea is to reuse valuable experience from other agents to speed up the learning process.This paper proposes a wait-and-see method to identify high-value traces of agents,based on which some recall traces are generated by a trained backtracking model to guide the training process of other agents.Experimental results show that the algorithm significantly outperforms existing approaches for tasks with sparse rewards,achieving higher convergence speed.(2)This paper further improves the multi-agent backtracking based deep deterministic policy gradient algorithm and propose a multi-agent selective learning algorithm,in order to make the algorithm suitable for scenarios with a large number of agents.A similarity-based selector is proposed to select and use information of only highly related agents but not all,which effectively avoids the dimensional disaster problem.The algorithm combines the selection of high-value trajectories and related agents,which improve the sample-efficiency and cope with dimensional disasters,thereby greatly improving the training speed.Experimental results show that the proposed algorithm significantly outperforms existing approaches for tasks with sparse rewards,achieving higher convergence speed and better scalability.
Keywords/Search Tags:Multi-agent learning, Reinforcement learning, Sparse reward
PDF Full Text Request
Related items