Font Size: a A A

Research On Training And Optimization Methods Of Deep Neural Networks

Posted on:2021-01-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q Y YuanFull Text:PDF
GTID:1368330611967081Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep learning methods have been applied to all aspects of social production and social life,such as object recognition,speech recognition,natural language processing,and driverless,which greatly improves the level of social intelligence.However,it is still considered difficult to training and optimizing the deep neural network,which requires a lot of experience and skills.As an important part of the basic theory of deep learning,the training and optimization of deep neural networks have a basic supporting role for deep learning applications.At present,the initialization methods of neural network are mostly independent of the network depth,the symmetries in the weight space of deep neural network have adverse effects on the training of neural network,Adam has the problem of convergence and generalization,and the geometric properties of the loss surface of the neural network have not been well understood.This dissertation focuses on how to train deep neural networks efficiently and how to solve these problems.The main contributions are as follows:(1)The Scaling-based weight normalization for deep neural networks is proposed: The symmetries in the weight space of neural network have an negative effect on the training of neural networks.Many researchers have proposed several methods to solve this problem,but these methods have a large computational cost.Our approach is based on the scaling invariance of the Relu network itself to solve this problem.During the training process,the weight of neural network is adjusted by performing point by performing the transformation of node-wise rescaling,including within-layer scaling adjustment when activation propagating forward and between-layer scaling adjustment when gradient propagating backward.The results of the experiment show that our normalization method can improve the performance of the variousneural networks of uniformly on various data sets.(2)The Fixup orthogonal initialization is devised: At present,there is no research on the signal propagation and dynamic isometric of the deep convolutional residual network at initialization.This dissertation uses the tools such as the mean field theory,random matrix and free probability theory,to deduce a recursive formula for the covariance matrix of the activation in the feature map of the deep convolutional residual network at initialization.This recursive formula has no fixed points.We also gives an method for calculating the distribution of the eigenvalue density of the input-output Jacobian of the deep convolutional residual network.The asymptotic analysis shows that the necessary condition for achieving dynamic isometry is that the network initialization must be related to the network depth.Based on these theoretical analysis and the delta orthogonal initialization for convolution network,We devised the fixup orthogonal initialization for deep convolution residual network,which is related to the network depth.The effectiveness of the initialization method proposed in this dissertation is verified by a large number of experiments.(3)The adaptive gradient method with adaptive dynamic momentum and basic learning rate is proposed.Recent studies have found that Adam algorithm has the convergence problems and the generalization gap between SGDM.In this dissertation,the influence of basic learning rate,momentum coefficient and adaptive learning rate coefficient on the complex dynamics of Adam algorithm is analyzed.Based on these analyses and with reference to adabound's design idea,an adaptive gradient method with dynamic momentum and basic learning rate is devised.For the first time,the direction cosine between successive iterative gradients and norm of gradient in the training process is integrated into the Adam-type algorithm to adjust these coefficients.In the later stage of training,our proposed algorithm can be smoothly switched to the SGDM so as to improve the generalization ability by controlling these coefficients.The designed algorithm has the advantages of the fast convergence of Adam algorithm training and good generalization ability of the SGD.Through the experiments of various machine learning tasks,it is verified that the performance of our proposed Adam-type algorithm exceeds that of Adam,amsgrad and adabound.(4)The monotonic policy optimization algorithm is proposed.The key problem for applying the nonlinear function approximator(such as deep neural network)to the reinforcement learning is that the update policy produced by the many existed policy optimization algorithms for the reinforcement learning may fail to monotonically improve policy performance or even causes a serious degradation of the policy performance.Therefore,this dissertation proposes a new lower bound on the policy improvement where an average policy divergence on state space is penalized.Optimizing directly the lower bound on the policy improvement is very difficult,because it demands for high computational overhead.According to the ideal of the trust region policy optimization(TRPO)and using the generalized advantage function estimation to estimate the advantage function,the monotonic policy optimization algorithm is proposed in this dissertation,which is based on the new lower bound on the policy improvement,it can generate a sequence of monotonically improving policies,and it is suitable for the large-scale continuous control problems.This dissertation also evaluates and compares the proposed algorithms with some of the existed algorithms on highly challenging robot locomotion tasks.The effectiveness of the monotonic policy optimization method proposed in this dissertation is verified by a large number of experiments.(5)Experimental exploration on loss surface of deep neural network.In this dissertation,his paper makes the experimental exploration on the loss surface of the deep neural network,including trajectories of various adaptive optimization algorithms,the Hessian matrix of the loss function of the deep neural network,and the curvature of the loss surface along the trajectories of the various adaptive optimization algorithm.It is found that the gradient direction of various adaptive optimization algorithms is almost perpendicular to the eigenvectors direction corresponding to the top eigenvalues of the loss surface,while the gradient direction of the SGD does not show such a rule.The Hessian matrix of loss surface along the trajectory of Adam algorithms is almost degenerate,which shows that it is unreasonable to assume that Hessian matrix is nonsingular in many theoretical studies.(6)The Scale-based ensembling method of deep neural network is proposed.The key problem of introducing the ensembling method into the deep neural network is to reduce the training cost of getting a single network model.Based the diversity of the corresponding network models near the local minimas and the scaling invariance of Relu neurons,Scale-based ensembling learning of deep neural network(SBE)is proposed in this dissertation,which can produce multiple neural network models with high test accuracy and good diversity at computational overhead of training a network model to convergence.Experimental results show that our method yields higher test accuracy than the state-of-the-art of ensembling methods of the neural network,such as SSE and FGE at the same computation cost.
Keywords/Search Tags:Normalization method of deep neural network, Initialization method of deep neural network, Adam-type algorithms, Loss surface of deep neural network, Ensembling of neural network, Policy optimization
PDF Full Text Request
Related items