Research And Application Of Fractional Order Optimization Method For Deep Q Network

Posted on:2022-06-16

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Zhou

Full Text:PDF

GTID:2518306332473924

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Deep reinforcement learning is very close to the way of human thinking.It is more and more widely used.For example,go game,driverless,financial market.Deep Q network is a common deep reinforcement learning method.However,the algorithm has some shortcomings,such as large amount of calculation,slow convergence or even non convergence,and easy to overestimate.To solve the above problems,this paper proposes a fractional order optimization method for pre training network.The data structure of experience queue is improved.The main contents of this paper are as follows:1.In order to accelerate the convergence speed of deep Q network.supervised learning and reinforcement learning,fractional calculus and neural network are combined.A pre training model of deep q-network is proposed to improve the GL type fractional order.Firstly,because of the problem of negative fractional power after the introduction of fractional calculus,the method of symbol separation is proposed.Secondly,due to the large amount of computation,the idea of divide and conquer is used to design a method to find the optimal fractional order.It is proved theoretically that the time complexity of the algorithm is O（log₂n）.Finally,the handwriting recognition data set and market transaction data set are used to verify.The results show that the proposed algorithm is better than the integer order model in improving the convergence speed and suppressing overestimation.2.Because the training set of deep Q network needs a large number of samples,this paper proposes a new data structure of experience queue based on the original experience queue.A new empirical queue scheduling algorithm is designed.Firstly,the playback times and value ratio are introduced,and the value ratio is calculated by logit function to polarize the samples.Then the value ratio is used to generate the sampling distribution function.The distribution function sampling reduces the possibility of repeated sampling or poor sampling in the traditional experience playback process,and improves the playback performance of the experience queue.Finally,the improved algorithm is applied in decision data analysis,and good results are obtained.

Keywords/Search Tags:

Deep Reinforcement Learning, DQN, Fractional Network, DQN Optimization, Experience Queue

PDF Full Text Request

Related items

1	Research On Improvement Method Of Experience Playback Mechanism In Deep Reinforcement Learning
2	Improvement And Application Of Deep Reinforcement Learning Based On Experience Replay Mechanism
3	Deep Reinforcement Learning With Experience Replay
4	Research On Optimization Methods Of The Experience Replay Mechanism For Off-policy Reinforcement Learning
5	Research On Experience Replay Method For Deep Reinforcement Learning
6	Improvement And Research On Progressive Algorithm For Beinforcement Learning
7	Research On Security Deep Reinforcement Learning Based On Experiences
8	Research On Multi-agent Confrontation Strategy Based On Deep Reinforcement Learning
9	Research On Improvement Of Multi-Object Motion Coordination Reinforcement Learning Algorithm In Specific Road Network Environment
10	Research And Implementation Of Recommendation Algorithm Based On Deep Reinforcement Learning