Research On Environment Adaptive Reinforcement Learning Methods

Posted on:2022-08-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Wang

Full Text:PDF

GTID:2518306323462454

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Although reinforcement learning has been successfully applied in many areas,the applications of reinforcement learning are still limited by the sparse reward problem and the environment non-stationarity problem.The performance of reinforcement learning strongly depend on how well the reward signal frames the goal of the application's de-signer and how well the model addresses the environment non-stationarity problem.This essentially reflects the accuracy of environment modeling and the stability of the optimization process.Reward function adaptation and environment dynamics adap-tation are critical parts of reinforcement learning applications on non-standard envi-ronments,which requires that the algorithm can automatically design reward function and adaptively train in the complex environment.From the perspective of environ-ment modeling and solving process,we conducts research on environment adaptation methods of reinforcement learning through reward function adaptation and environment dynamics adaptation.From the perspective of environmental modeling to deal with the challenge of reward design,we propose the Motivation-Based Reward Design(MBRD)method.MBRD introduces the concept of motivation which captures the underlying goal of maximizing certain rewards.The basic idea of MBRD is to automatically generate goal-consistent intrinsic rewards for the agent to learn by minimizing the distance between the intrinsic and extrinsic motivations.The core of MBRD is to solve two problems:how to map the reward function to motivation and how to measure the distance between motivations.MBRD provides the ability to improve the reward function based on train-ing dynamics.We conduct extensive experiments in three Grid-world environments and two MuJoCo environments,show the advantages of MBRD method in handling prob-lems of delayed reward,exploration,and credit assignment.From the perspective of the solving process to deal with the challenge of environ-ment non-stationarity,we propose the Policy Adaptive Multi-Agent Deep Deterministic Policy Gradient(PAMADDPG)method.We model the environment non-stationarity with a finite set of scenarios and train policies fitting each scenario.In addition to mul-tiple policies,each agent also learns a policy predictor to determine which policy is the best with its local information.The core of PAMADDPG is to solve two problems:how to train multi-policy agents and how to choose the execution policy.PAMADDPG pro-vides the ability to train stably under unstable environment dynamics.We empirically evaluated our method on three Multi-Agent Particle Environment and show that the PAMADDPG method performs better than the baseline methods on mixed cooperative-competitive domains and a fully cooperative domain.

Keywords/Search Tags:

Reinforcement Learning, Reward Design, Multi-Agent System

PDF Full Text Request

Related items

1	Research On Multi-Agent Reinforcement Learning Under Sparse Reward Scenario
2	Research On Mean-Field Multi-Agent Reinforcement Learning In Large Scale Scenarios
3	Researches Of Robocup’s Local Strategy Based On Multi-Agent Reinforcement Learning
4	Research On Reward Optimization In Reinforcement Learning
5	Research On Deep Reinforcement Learning Technology For Multi-agent Collaboration
6	Hierarchical reinforcement learning in continuous state and multi-agent environments
7	Study On The Improved Average Reward Reinforcement Learning Algorithm Based On Performance Potentials
8	Research And Application Of Deep Reinforcenment Learning Algorithms Based On Reward Shaping
9	The Research On Reinforcement Learning Based On Cooperative Multi-agent
10	Research On Multi-agent Cooperation Method Based On Deep Reinforcement Learning