Font Size: a A A

Research On Stochastic Optimization Methods For Large Scale Datasets

Posted on:2020-10-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:M X TangFull Text:PDF
GTID:1488306548492574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine learning as the core support of artificial intelligence technology has attracted widely attention The increasing scale of datasets brings new challenges to solving machine learning models with optimization methods.In order to improve the convergence speed and accuracy for stochastic optimization methods in condition of big data,this paper aims at the problems of low efficiency of stochastic variance reduction,low generalization ability of adaptive gradient method,slow convergence speed of stochastic quasi Newton method,high parameter adjustment requirements of lookahead method.The flexible variance reduction for SGD,the boundary scheduling for adaptive gradient methods,the variance reduction for stochastic quasi Newton method and the learning rate scheduling for lookahead method are studied in this paper.The main contents of this paper include the following aspects:Traditional variance reduction methods have two disadvantages while dealing with large-scale data sets:(1)too much storage or computation overhead;(2)frequent disk IO.To solve this problem,this paper proposes a flexible variance reduction method,named FVR-SGD,which can reduce the additional overhead of traditional variance reduction methods and improve memory efficiency.FVR-SGD does not need to calculate or store the full gradient information,the implementation of variance reduction is based on the historical gradient information.FVR-SGD can flexibly divide the data set into multiple subsets according to the size of memory,which can reduce disk IO and improve the calculation efficiency.We also prove that FVR-SGD has linear convergence speed and give the specific conditions of convergence.Experimental results show that FVR-SGD can reduce the computational cost and improve the convergence speed,which has more advantages in dealing with large-scale data or complex parameter model.Adaptive gradient methods have the problems of low convergence accuracy and weak generalization ability when training complex neural network model.This paper proposes a boundary scheduling method for adaptive gradient methods,named bsadam,which can improve the convergence accuracy and generalization ability by limiting and scheduling the adaptive gradient.Bsadam analyzes various factors that may reduce the generalization ability of the adaptive gradient methods and proposes an optimization scheme for these factors.In order to avoid the side effect of extreme learning rate,Badadam sets a reasonable boundary for the learning rate,so as to ensure the convergence performance of the algorithm.In order to improve the convergence accuracy and generalization ability of the adaptive gradient methods,Bsadam schedules the boundary according to the different requirements of each stage.The lower bound of the learning rate is improved to find a better minimum value,and then the upper bound of the learning rate is decreased to ensure the convergence.The experimental results show that Bsadam can not only keep a fast convergence speed,but also improve the convergence accuracy of the adaptive gradient method.Stochastic quasi Newton methods have the problems of unstable construction of vector sequence and slow convergence speed.This paper proposes a stochastic quasi Newton acceleration method based on a variety of variance reduction methods,which can improve the accuracy of gradient estimation of quasi Newton method and the stability of construction of vector sequence.Because the two adjacent iterations use different data to estimate the gradient,the stochastic quasi Newton methods may lead to a large difference in the gradient calculated by the two adjacent iterations,which makes the vector sequence jitter and leads to the deviation in the update direction of the algorithm.This paper proposes a acceleration framework for stochastic quasi Newton methods.In this framework,SVRG,SAGA,SAG and other variance reduction methods can be used with stochastic quasi Newton methods,which can ensure the stability of update direction.Furthermore,this paper proposes a lightweight variance reduction method without extra computation and storage overhead,which makes the algorithm more efficient in dealing with large-scale data sets.Experimental results show that the stochastic quasi Newton acceleration method based on variance reduction can achieve obvious acceleration on large data sets.The Lookahead method based on SGD has the problems of manual parameter adjustment and slow convergence speed.In this paper,learning rate cyclic scheduling and single cycle scheduling are proposed based on lookahead method,which can avoid manual parameter adjustment and improve convergence speed.The learning rate scheduling method only needs to define a reasonable learning rate interval,which can avoid additional parameter adjustment cost.The cyclic scheduling method divides the whole iterative process into several cycles.In each cycle,the learning rate changes from high to low,which makes Lookahead have the ability to cross the local minimum value and converge to the nearby optimal solution in each cycle.Through the training of multiple cycles,the algorithm finally finds a better minimum value.The single cycle scheduling method divides the iterative process into two stages.The first stage changes the learning rate from low to high,which makes Lookahead method can quickly cross the local optimal solution and find a more suitable global optimal solution.The second stage makes Lookahead method converge to a global optimal solution by constantly reducing the learning rate.Experimental results show that the acceleration methods based on learning rate scheduling can improve the convergence speed and accuracy.
Keywords/Search Tags:Stochastic, Optimization, Gradient Descent, Newton Method
PDF Full Text Request
Related items