| Deep reinforcement learning is very close to the way of human thinking.It is more and more widely used.For example,go game,driverless,financial market.Deep Q network is a common deep reinforcement learning method.However,the algorithm has some shortcomings,such as large amount of calculation,slow convergence or even non convergence,and easy to overestimate.To solve the above problems,this paper proposes a fractional order optimization method for pre training network.The data structure of experience queue is improved.The main contents of this paper are as follows:1.In order to accelerate the convergence speed of deep Q network.supervised learning and reinforcement learning,fractional calculus and neural network are combined.A pre training model of deep q-network is proposed to improve the GL type fractional order.Firstly,because of the problem of negative fractional power after the introduction of fractional calculus,the method of symbol separation is proposed.Secondly,due to the large amount of computation,the idea of divide and conquer is used to design a method to find the optimal fractional order.It is proved theoretically that the time complexity of the algorithm is O(log2n).Finally,the handwriting recognition data set and market transaction data set are used to verify.The results show that the proposed algorithm is better than the integer order model in improving the convergence speed and suppressing overestimation.2.Because the training set of deep Q network needs a large number of samples,this paper proposes a new data structure of experience queue based on the original experience queue.A new empirical queue scheduling algorithm is designed.Firstly,the playback times and value ratio are introduced,and the value ratio is calculated by logit function to polarize the samples.Then the value ratio is used to generate the sampling distribution function.The distribution function sampling reduces the possibility of repeated sampling or poor sampling in the traditional experience playback process,and improves the playback performance of the experience queue.Finally,the improved algorithm is applied in decision data analysis,and good results are obtained. |