Research On Deep Reinforcement Learning Method For Environment With Non-stationary Dynamics

Posted on:2022-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y Pu

Full Text:PDF

GTID:2518306323979719

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,deep reinforcement learning methods have gradually been applied to video games,Go,poker,robotics control and other fields successfully.However,there are still many problems and challenges,such as low sample efficiency,exploration-exploitation dilemma,abnormal sensitivity to hyperparameters,poor convergence and reproducibility and so on.Especially when the dynamics of environments(including state transition probabilistic function and reward function)changes,the deep reinforce-ment learning algorithm is particularly unstable.Therefore,how to obtain an efficient,stable and general reinforcement learning method in such environment is a very impor-tant research direction.To address these issues,this dissertation conducts research in single-agent and multi-agent scenarios respectively,the main work and innovations are as follows:In the single-agent scenarios,we proposed a latent context based soft actor critic method(LC-SAC).In this method,an additional latent context encoder module is in-droduced.The encoder utilizes the recurrent neural network structure,and take the ex-perience transition triples(state,action and reward)as inputs,and the context variable as outputs.By optimizing the contrative prediction loss function,the context vector captures the information of the environment dynamics and the recent behavior of the agent,which is critical for the effective policy optimization in the environment with non-stationary dynamics.Then combined with the soft policy iteration paradigm,the LC-SAC method alternates between soft policy evaluation and soft policy improvement.Experimental results show that the performance of LC-SAC is significantly better than the SAC algorithm on the MetaWorld ML1 tasks whose dynamics changes among dif-ferent episodes,and is comparable to SAC on the continuous control benchmark task MuJoCo whose dynamics changes slowly or doesn't change between different episodes.At the same time,we also conduct relevant experiments to determine the impact of dif-ferent hyperparameter settings on the performance of the LC-SAC algorithm and give the recommendations of hyperparameter setting.In the multi-agent scenarios,a multi-agent soft actor-critic method(mSAC)based on action-value function decomposition is proposed,which effectively combines multi-agent value function decomposition and policy-based method.The main modules in-clude decomposed Q network architecture,discrete probabilistic policy and counter-factual advantage function(optinal).Theoretically,mSAC supports efficient off-policy learning and can be applied to tasks with discrete or continuous action spaces at the same time.Tested on real-time-strategy game StarCraft II micromanagement benchmark,we empirically investigate the performance of mSAC against its variants and analyze the effects of different components.Experimental results demonstrate that mSAC signif-icantly outperforms policy-based approach COMA,and achieves competitive results with SOTA value-based approach Qmix on most tasks in terms of asymptotic perfor-mance metric.In addition,mSAC outperforms Qmix largely on many tasks with large action space.In summary,this dissertation aims at the problem of non-stationary dynamics in complex environments and proposed corresponding improved algorithms in single-agent and multi-agent scenarios,respectively.Good experimental results have been achieved,which has practical application value and has a certain driving effect on the development of reinforcement learning domain.

Keywords/Search Tags:

Deep Reinforcement Learning, Non-stationary Environment, Multi-Agent, Game Operation, Robotic Control

PDF Full Text Request

Related items

1	Research On Multi-Agent Pursuit-Evasion Based On Deep Reinforcement Learning
2	A Research Of Hierarchical Multi-agents Deep Reinforcement Learning For Action Game
3	Research And Implementation On Game Control Algorithm Based On Deepening Reinforcement Learning
4	Research On Multi-agent Cooperative Confrontation Method Based On Deep Reinforcement Learning
5	Research On Deep Reinforcement Learning Technology For Multi-agent Collaboration
6	Research On Multi-objective Workflow Scheduling With Deep-Q-network-based Multi-agent Reinforcement Learning
7	Agent Environment Perception And Control Decision Based On Deep Reinforcement Learning
8	Research On Multi-Agent Cooperative Algorithm Based On Deep Reinforcement Learning
9	On Improved Reinforcement Learning Algorithm And Its Application In Control Of Robotic Manipulators
10	A Multi-agent Reinforcement Learning Algorithm Based On Stackelberg Game