Research On Multi-Agent Reinforcement Learning Under Sparse Reward Scenario

Posted on:2021-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:S N Chen

Full Text:PDF

GTID:2518306122477934

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The setting of reward functions greatly impacts the solution quality of reinforcement learning tasks.For tasks such as new environment exploration,the reward functions are usually sparse,i.e.they return high-valued rewards for only a few states but provide no reward guidance to the agent in most cases.The reward sparsity results in low learning efficiency for such tasks.While there are a few works on tackling reward sparsity for single agent learning scenarios,the problem is much more difficult in multi-agent reinforcement learning scenarios because 1)the rewards become extremely sparse when the number of agents increases and 2)the information from multiple agents needs to be jointly considered to alleviate environment nonstationarity,which may result in dimensional disaster.It is of significance for better applying multi-agent reinforcement learning to solving real-world problems to study methods that can apply multi-agent reinforcement learning to sparse reward tasks,and improve the training speed and scalability of the algorithm.In this paper,we focus on the problem of sparse reward in multi-agent scenarios.The main contributions are as follows:(1)A multi-agent backtracking based deep deterministic policy gradient algorithm is proposed to solve the problem of low sample efficiency caused by sparse rewards.The key idea is to reuse valuable experience from other agents to speed up the learning process.This paper proposes a wait-and-see method to identify high-value traces of agents,based on which some recall traces are generated by a trained backtracking model to guide the training process of other agents.Experimental results show that the algorithm significantly outperforms existing approaches for tasks with sparse rewards,achieving higher convergence speed.(2)This paper further improves the multi-agent backtracking based deep deterministic policy gradient algorithm and propose a multi-agent selective learning algorithm,in order to make the algorithm suitable for scenarios with a large number of agents.A similarity-based selector is proposed to select and use information of only highly related agents but not all,which effectively avoids the dimensional disaster problem.The algorithm combines the selection of high-value trajectories and related agents,which improve the sample-efficiency and cope with dimensional disasters,thereby greatly improving the training speed.Experimental results show that the proposed algorithm significantly outperforms existing approaches for tasks with sparse rewards,achieving higher convergence speed and better scalability.

Keywords/Search Tags:

Multi-agent learning, Reinforcement learning, Sparse reward

PDF Full Text Request

Related items

1	Research On Sparse Reward Based On Reinforcement Learning
2	Research On Mean-Field Multi-Agent Reinforcement Learning In Large Scale Scenarios
3	A Multi-agent Reinforcement Learning Algorithm Based On Sparse Interactions
4	Researches Of Robocup’s Local Strategy Based On Multi-Agent Reinforcement Learning
5	Research On Environment Adaptive Reinforcement Learning Methods
6	Decentralized Multi-agent Reinforcement Learning Algorithm Research
7	Research On Reward Optimization In Reinforcement Learning
8	Researches On Efficient Exploration Driven By Reward Function
9	Research On Deep Reinforcement Learning Technology For Multi-agent Collaboration
10	Research On The Sparse Reward Problem Based On Hierarchical Reinforcement Learning