Font Size: a A A

The Research Of Algorithms And Architectures On Deep Q-Network

Posted on:2018-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:J W ZhaiFull Text:PDF
GTID:2348330542965214Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning(DRL)is a new research hotspot in the field of machine learning.By using a general-purpose form,DRL integrates the advantages of the perception of deep learning(DL)and the decision making of reinforcement learning(RL),and gains a mapping from the raw inputs to action outputs by the end-to-end learning process.DRL has made substantial breakthroughs in a variety of large-scale decision-making tasks based on the ability of visual perception.In particular,Deep Q-Network(DQN)is able to perform human-level control when handling a kind of video games.However,DQN can be faced with sparse and delayed reward,patially observed states,slow convergence speed and unstable performance problems in some complex problems approaching real scenarios.To alleviate these issues,this paper proposes three novel deep reinforcement learning methods which improve the training algorithm or model architecture on the basis of DQN.The main research is outlined as follows:i.In order to address the problem that DQN can't dierentiate the importance of distinct transitions,this paper proposes a novel algorithm called Deep Q-Learning with Prioritized Sampling.Compared with DQN,the proposed algorithm uses a highly efficient priority-based experience replay mechanism instead of random sampling in order to increase the utilization rate of valuable samples.Furthermore,the new algorithm ensures that every transition in the sample space can be replayed with a certain probability,eventually leading to an improvement of the algorithm convergence rate.ii.In view of the problem that DQN is not good at solving strategic decision-making tasks,this paper proposes a novel deep reinforcement learning model architecture called a deep recurrent Q-Network based on visual attention mechanism.The proposed model mainly has two innovations:(i)it uses recurrent neural networks consisting of two-layer gated recurrent units in order to remember more historical state information of multiple time steps.This can make agents exploit delayed feedback in time to guide its next action selection online.(ii)the visual attention mechanism is used to make agents adaptively focus attention on smaller but more valuable regions of an input image.As a result,the number of weights which need to be trained is reduced,leading to an acceleration of learning near optimal policies.iii.In order to alleviate the instability problem when using deep deterministic policy gradient algorithm to solve tasks of continuous action space,this paper proposes a novel deep reinforcement learning algorithm called deep deterministic policy gradient with mixed update targets.The new algorithm combines with the on-policy MC estimation and off-policy Q-learning in order to generate mixed target q-values.This can reduce the error when estimating target q-values and improve the performance and stability of the algorithm in continuous action space tasks.
Keywords/Search Tags:Deep Learning, Reinforcement Learning, Deep Reinforcement Learning, Deep Q-Network, Deep Deterministic Policy Gradient
PDF Full Text Request
Related items