| With the development of technology,deep learning has inevitably become the general trend.Data,network model and optimization algorithm have become the most critical factors in determining the performance of neural network,and the optimization algorithm directly determines the value of all parameters of the network model.Therefore,it is very important to properly optimize the neural network.On the one hand,fractional calculus can continuously adjust the order by updating,which can provide new possibilities for improving the performance of gradient descent algorithm,and on the other hand,the energy dissipation law of fractional time and the principle of numerical stability are studied as Our study of parameter averaging provides a new idea.The current optimization algorithms for neural network training are mainly related gradient descent algorithms.Since the parameter averaging method in neural network training can be regarded as memory-dependent,this is just one of the important characteristics of the time fractional derivative;and the parameter averaging method is not convenient for theoretical analysis,and the time fractional derivative is more conducive to this form.Do theoretical analysis.This paper is to study the influence of memory dependence on neural network training,and propose an improvement direction of gradient descent algorithm,and propose time-fractional gradient descent algorithm(TFGD).The time-fractional gradient descent algorithm mainly performs a corresponding weighted average on the weight of each update iteration of the relevant gradient descent algorithm,thereby reducing the loss of neural network training.The time-fractional gradient descent algorithm in this paper is researched by combining theoretical derivation and neural network training experiments.It is found that the time-fractional gradient descent algorithm is compared with the ordinary gradient descent algorithm(GD)and the stochastic gradient descent algorithm(SGD).Under the condition of suitable learning rate ,when the value of fractional order is close to 1,the neural network training optimization effect of this time-fractional gradient descent algorithm is more significant.Based on the empirical analysis of the Mnist data set,the GD algorithm and the SGD algorithm are compared with the TFGD algorithm under different learning rates,and it is verified that the TFGD algorithm has a fractional order under the condition of a suitable learning rate .When the values are 0.95 and 0.99,the neural network training optimization effect is significant,and it is concluded that memory dependence has an impact on the neural network. |