Font Size: a A A

Research Of Advancing Value-based Deep Reinforcement Learning

Posted on:2021-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:D N YuFull Text:PDF
GTID:2518306020967269Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
As a branch of machine learning,deep reinforcement learning combines the perception ability of deep learning with the decision-making ability of reinforcement learning to realize end-to-end learning from perception to behavior.Previous research on value-based deep reinforcement learning has achieved great success in sequence decision tasks with high-dimensional perception as input,but there are some potential issues such as inaccurate value function approximation,inefficient learning and low utilization of training data.Besides,in the complex partially observable environment,the algorithm still has problems in network training difficulties and unstable performance due to the state information is often calculated by the recurrent neural network.To address these problems,this paper improves the model structure of deep reinforcement learning algorithms based on value functions.The main work and contributions are as follows:(1)Proposed the Prediction Information based Deep Q-network.To improve the decision-making performance of the deep Q-learning algorithm and the utilization rate of the training data,the new algorithm embeds the empirical prediction module in the Deep Q-network.The empirical prediction module can provide predictive information with guiding effect based on experience,which helps it to calculate the optimal value function and improve the decision-making performance of the algorithm.Because the added predictive information is calculated from the empirical data extracted from the experience pool rather than through model simulation,it can improve the utilization rate of training data and avoid the potential modelling problems in complex environment.(2)Proposed the Deep Q-learning with Recurrent Predictive State Representation Model algorithm.In the partially observable environment,the agent cannot determine the state based on the current observation,so it is necessary to predict the current state based on past actions and observation history.Considering the advantages of the model algorithm with good stability and high data utilization,this paper uses the neural network to build a Recurrent Predictive State Representation model for the environment with continuous observation space to achieve the representation and tracking of the state.Then,the established network model is combined with the Deep-Q Network to realize the sequence decision problem without prior knowledge in a continuous partially observable environment.(3)Proposed a Recurrent-convolutional Network based Value Iteration for POMDP.The existing QMDP-net algorithm uses a convolutional neural network to parametrically represent the QMDP algorithm,which realizes a fast and efficient solution to partially observable Markov decision problems.However,using the convolutional layer and the maximum pooling layer to simulate the value iterative update process will lead to unstable performance,so this paper improves its network structure by using the gated recurrent unit network to complete the value iterative update process.By reconstructing the value iteration module as recurrent-convolutional networks,the optimization problem caused by the network structure during the value iteration process is effectively alleviated,and the agent will have better performance in solving the sequence decision problem in the complex high-dimensional continuous partially observable environment.
Keywords/Search Tags:Deep Learning, Reinforcement Learning, Value Function, Sequential Decision
PDF Full Text Request
Related items