Font Size: a A A

Deep Reinforcement Learning For Partially Observability

Posted on:2019-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:P F ZhuFull Text:PDF
GTID:2518306473954119Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep Reinforcement Learning(RL)recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments,e.g.,computer Go.However,most work about deep RL are focused on the problems in which environments' states are fully observable,while very little work has been done in deep RL to handle partially observable environments.And almost all the model proposed to solve partially observable tasks have some theoretical limitations that they don't fit the current state of environment accurately.Because from the conventional Partially Observable Markov Decision Process(POMDP)theory,actions are indispensable in the iteration updating of belief states in POMDP.These existing methods didn't start from this point and ignored the influence of actions.Therefore,this paper propose a new architecture called Action-specific Deep Recurrent Q-Network(ADRQN)to enhance learning performance in partially observable domains from the point of learn the belief state accurately.Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an action-observation pair.The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states which can be seen as a embedding representation of belief state based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks(DQNs).We evaluate our new architecture in several partially observable domains,including several standard and flickering Atari games.And the results show that our model has a higher score on all experiments and better generalization performance and more robust which demonstrate the effectiveness of our model.
Keywords/Search Tags:Reinforcement Learning, Deep Learning, Partially Observable Markov Decision Processing, Q-Learning
PDF Full Text Request
Related items