Research Of Advancing Deep Reinforcement Learning

Posted on:2019-01-31

Degree:Master

Type:Thesis

Country:China

Candidate:J M Xiao

Full Text:PDF

GTID:2428330590975370

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is a subdomain of machine learning.It has good performance in Markov decision process problems,such as industrial control,board game and automatic drive.The deep neural network research has become an important subdomain of machine learning after solving problems such as the disappearance of gradients and increasing performance of floating-point computing in computers.Therefore,it has become a research hotspot to combine the advantages of reinforcement learning and neural networks.In the work of DeepMind on Deep Q Network(DQN)inspired by Neural Fitted Q(NFQ)algorithm,deep neural network is used as a function approximator of Q-value table and they use the experience replay mechanism to make sure that samples of network are independent and identically distributed.DQN shows good results on video games.However,since it's a time-difference method,DQN algorithm is not efficient enough.This thesis attempts to improve the DQN algorithm from two perspectives: 1)accelerating the updating process by searching upper and lower bound of Q value of a stateaction pair,and 2)adjusting sampling probability in experience replay pool.The theoretical basis for the deep Q network algorithm(DQN)is the property of Bellman equation,but because it is a model-free algorithm,which cannot plan a policy in a bottom-toup manner by dynamic planning method.The DQN is a kind of time-difference method of which agent continuously interacts with the environment and repeats the update process until it converges.This method is very time-consuming.Therefore,in this thesis we propose a novel method to calculate a upper and lower bound of Q value of a state-action pair which can restrict evaluating of its target Q value,and it can improve the performance of DQN.On the other hand,in the prioritized experience replay method,the sample probabilities of states sampled from experience replay pool will be updated by their TD errors,but the states before them will not be changed.In this thesis,we consider that the new Q values should be propagated to their previous states.As a result of this,the sample probabilities of previous states should also be increased.Therefore,this can improve the efficiency of prioritized experience replay.Simulation results show the efficiency of our algorithms.

Keywords/Search Tags:

Reinforcement learning, neural network, Q-learning, Optimization

PDF Full Text Request

Related items

1	Research On Energy Optimization Of Wireless Access Network Based On Transfer Reinforcement Learning
2	Automatically Learn Cost-constrained Convolutional Neural Network Architectures With Reinforcement Learning
3	Research On Sockt Forecasting System Based On Reinforcement Learning
4	Supervised Reinforcement Learning:methods And Applications
5	Research On Optimization Methods Of Micro Mouse Based On Reinforcement Learning
6	Research On Routing Optimization Of SDN Based On Reinforcement Learning Method
7	A Model Of Intelligent Arranging School Syllabus System Based On Reinforcement Learning
8	Research On Reinforcement Learning Algorithms For Complex Problems
9	Based On Deep Reinforcement Learning And Neural Network On XSS Attack Detecion Technology
10	Research On End-to-End Learning And Optimization Of Neural Network Controller In Robotics