Font Size: a A A

Research On Optimization Method Of Statistical Model Based On Natural Gradient

Posted on:2020-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:J XieFull Text:PDF
GTID:2417330596475287Subject:Statistics
Abstract/Summary:PDF Full Text Request
Classification and regression problems are two main problems of machine learning.The main method to solve them is to establish corresponding statistical models with parameters,and to train according to the observed sample data to obtain the optimal model,and then to establish the corresponding classifier and fitter.The linear classification mainly adopts the logistic regression model,and the nonlinear classification and regression tasks mainly adopt the neural network model.The most popular method of training these models is the stochastic gradient descent algorithm.However,the gradient descent algorithm only uses the first-order information of the function,and because the neural network has a highly non-convex nature,the error surface has many plain areas,which makes the training process often slow,and needs to consider more advanced algorithms.The natural algorithm can solve this problem by effectively avoiding the plain area of the function.To implement a natural gradient algorithm,we need to calculate the information matrix and its inverse matrix.When the model has many parameters,calculating the natural gradient direction requires a large amount of cpmputation and storage cost.Traditionally,the definition of the information matrix is the expection of the outer product of the gradient vector and the number of elements of the matrix is the square of the number of gradient vectors.The reserved parameters are in the form of a matrix,and a new natural gradient algorithm can be established,which is called a simplified natural gradient algorithm.Due to the instability of the simplified natural gradient algorithm,we propose an improved algorithm,called the simplified adaptive natural gradient algorithm,which requires less computational complexity and storage space than simplified natural gradients,and according to experiments,the convergence speed even exceeds the momentum.The main contents of this paper are summarized as follows:1.The source of the natural gradient algorithm is described.The natural gradient algorithm is the fastest stochastic gradient algorithm by measuring the probability distribution,namely KL divergence,Bayesian view,Cramer-Rao lower bound,and whitening parameter space.In addition,the existing practical natural gradient algorithm is summarized,which is how to calculate the information matrix and its inverse matrix,and discusses the reason why the natural gradient algorithm converges quickly.2.simplified adaptive natural gradient algorithm is introduced.This algorithm is an improvement to the simplified natural gradient algorithm.The starting point of the simplified natural gradient algorithm is to retain the parameters in the form of a matrix to reduce the amount of computation.Since the empirical block information matrix is easy to be absent,the oscillation does not converge at the end of the algorithm,and the early error of the algorithm is hardly decline.The simplified adaptive natural gradient algorithm uses the real block information matrix as the scaling matrix.Experiments show that the error decreases rapidly and the algorithm accelerates convergence smoothly in the later stage.3.The implementation of the simplified adaptive natural gradient algorithm on several models is given,and the feasibility of the algorithm is theoretically illustrated.At the same time,the computational complexity of the algorithm is given,which is much smaller than the existing second-order method.Finally,the further improvement of the algorithm is given.By considering the precession term,the algorithm converges faster.
Keywords/Search Tags:Natural gradient, stochastic gradient descent, simplified adaptive natural gradient, simplified natural gradient, neural network
PDF Full Text Request
Related items