Research Of Advancing Value-based Deep Reinforcement Learning

Posted on:2021-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:D N Yu

Full Text:PDF

GTID:2518306020967269

Subject:Systems Engineering

Abstract/Summary:

PDF Full Text Request

As a branch of machine learning,deep reinforcement learning combines the perception ability of deep learning with the decision-making ability of reinforcement learning to realize end-to-end learning from perception to behavior.Previous research on value-based deep reinforcement learning has achieved great success in sequence decision tasks with high-dimensional perception as input,but there are some potential issues such as inaccurate value function approximation,inefficient learning and low utilization of training data.Besides,in the complex partially observable environment,the algorithm still has problems in network training difficulties and unstable performance due to the state information is often calculated by the recurrent neural network.To address these problems,this paper improves the model structure of deep reinforcement learning algorithms based on value functions.The main work and contributions are as follows:(1)Proposed the Prediction Information based Deep Q-network.To improve the decision-making performance of the deep Q-learning algorithm and the utilization rate of the training data,the new algorithm embeds the empirical prediction module in the Deep Q-network.The empirical prediction module can provide predictive information with guiding effect based on experience,which helps it to calculate the optimal value function and improve the decision-making performance of the algorithm.Because the added predictive information is calculated from the empirical data extracted from the experience pool rather than through model simulation,it can improve the utilization rate of training data and avoid the potential modelling problems in complex environment.(2)Proposed the Deep Q-learning with Recurrent Predictive State Representation Model algorithm.In the partially observable environment,the agent cannot determine the state based on the current observation,so it is necessary to predict the current state based on past actions and observation history.Considering the advantages of the model algorithm with good stability and high data utilization,this paper uses the neural network to build a Recurrent Predictive State Representation model for the environment with continuous observation space to achieve the representation and tracking of the state.Then,the established network model is combined with the Deep-Q Network to realize the sequence decision problem without prior knowledge in a continuous partially observable environment.(3)Proposed a Recurrent-convolutional Network based Value Iteration for POMDP.The existing QMDP-net algorithm uses a convolutional neural network to parametrically represent the QMDP algorithm,which realizes a fast and efficient solution to partially observable Markov decision problems.However,using the convolutional layer and the maximum pooling layer to simulate the value iterative update process will lead to unstable performance,so this paper improves its network structure by using the gated recurrent unit network to complete the value iterative update process.By reconstructing the value iteration module as recurrent-convolutional networks,the optimization problem caused by the network structure during the value iteration process is effectively alleviated,and the agent will have better performance in solving the sequence decision problem in the complex high-dimensional continuous partially observable environment.

Keywords/Search Tags:

Deep Learning, Reinforcement Learning, Value Function, Sequential Decision

PDF Full Text Request

Related items

1	Research On Command Decision Method From RTS Perspective On Deep Learning
2	Research On Value Function In Deep Reinforcement Learning
3	Research And Application Of Decision-making Model For Video Games Based On Deep Reinforcement Learning
4	Research On Decision Distribution Modeling In Reinforcement Learning
5	Inverse Reinforcement Learning And Imitation Learning With Applications In Intelligent Robotics
6	Research On Value Function Model In Deep Reinforcement Learning
7	Research On Reinforcement Learning Algorithm Based On Improved Action Decision Method
8	Research On Advanced Deep Reinforcement Learning Algorithm For Image Games
9	Supervised Reinforcement Learning:methods And Applications
10	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning