Deep Value Iteration Network For Partially Observable Markov Decision Process

Posted on:2019-03-28

Degree:Master

Type:Thesis

Country:China

Candidate:G H Miao

Full Text:PDF

GTID:2518306473453774

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Recent years,deep reinforcement learning(DRL)has been a new research hotspot in the field of artifical intelligence.At present,deep reinforcement learning has been successfully applied to many fields,such as game strategy,machine translation,text generation,target tracking and so on.However,most deep reinforcement learning models are used to solve the task which environments' s states are completely observable,while the research on deep reinforcement learning in partially observable environments is still very few.At the same time,the existing models only utilize the generalization ability of the deep neural network to fit the value function or the policy function,and such model-free methods can achieve good performance,but often ignore the structural information of the task.In order to make full use of this structural information,a deep reinforcement learning algorithm based on value iteration is proposed in this paper,which effectively combines the advantages of model-free learning and model-based planning.Inspired by the ADRQN,the model uses two independent recurrent neural networks to complete the calculation of the beliefs of state and the iteration of the Qvalues,and the outputs are integrated for the choice of action.Similar to the calculation of the beliefs of state,the latent state of the recurrent neural network used to iterate the Q-values is considered as a embedding representation of Q-values.Through backpropagating gradients,the model learns the structural information of the potential Markov Decision Process,and then completes the iteration of Q-values.Secondly,the sub optimality of action-observation pair's coupling input form in ADRQN model is analyzed in this paper.In order to avoid the suboptimality caused by difference between the input forms of observation and action,this paper makes a simple modification to the input structure of the model and applies it to the standard POMDP tasks.Compared with the ADRQN,the proposed model achieves better performance on multiple navigation tasks,which verifies the effectiveness of the improved methods proposed in this paper.

Keywords/Search Tags:

Reinforcement Learning, Deep Learning, Partially Observable Markov Decision Processing, Value Iteration

PDF Full Text Request

Related items

1	Deep Reinforcement Learning For Partially Observability
2	Heuristic Learning Model Based On Partially Observable Markov Decision Process
3	Learning partially observable Markov decision processes using abstract actions
4	Research On Optimization Of Service Composition Based On Partially Observable Environment
5	The Reinforcement Learning Research Based On Internal State In Partially Observable Markov Decision Processes
6	Hierarchical learning and planning in partially observable Markov decision processes
7	Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments
8	The Research And Design Of Point-based POMDP Value Iteration Algorithm
9	Theories, Algortihms And Applications Of Policy Gradient Reinforcement Learning
10	The Design And Implementation Of Point-based POMDP Policy Iteration Algorithm