Deep Reinforcement Learning For Partially Observability

Posted on:2019-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:P F Zhu

Full Text:PDF

GTID:2518306473954119

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Deep Reinforcement Learning(RL)recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments,e.g.,computer Go.However,most work about deep RL are focused on the problems in which environments' states are fully observable,while very little work has been done in deep RL to handle partially observable environments.And almost all the model proposed to solve partially observable tasks have some theoretical limitations that they don't fit the current state of environment accurately.Because from the conventional Partially Observable Markov Decision Process(POMDP)theory,actions are indispensable in the iteration updating of belief states in POMDP.These existing methods didn't start from this point and ignored the influence of actions.Therefore,this paper propose a new architecture called Action-specific Deep Recurrent Q-Network(ADRQN)to enhance learning performance in partially observable domains from the point of learn the belief state accurately.Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an action-observation pair.The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states which can be seen as a embedding representation of belief state based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks(DQNs).We evaluate our new architecture in several partially observable domains,including several standard and flickering Atari games.And the results show that our model has a higher score on all experiments and better generalization performance and more robust which demonstrate the effectiveness of our model.

Keywords/Search Tags:

Reinforcement Learning, Deep Learning, Partially Observable Markov Decision Processing, Q-Learning

PDF Full Text Request

Related items

1	Deep Value Iteration Network For Partially Observable Markov Decision Process
2	Heuristic Learning Model Based On Partially Observable Markov Decision Process
3	Research On Optimization Of Service Composition Based On Partially Observable Environment
4	The Reinforcement Learning Research Based On Internal State In Partially Observable Markov Decision Processes
5	Learning partially observable Markov decision processes using abstract actions
6	Hierarchical learning and planning in partially observable Markov decision processes
7	Theories, Algortihms And Applications Of Policy Gradient Reinforcement Learning
8	Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments
9	Research On Sample-efficient Reinforcement Learning Methods
10	Research And Implementation Of Strategy Effectiveness Guarantee Mechanism For Self-Adaptive Software