Font Size: a A A

Efficient Stochastic Optimization Algorithms For Large-Scale PCA And SVD Problems

Posted on:2022-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhangFull Text:PDF
GTID:2518306605466094Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Principal component analysis(PCA)and singular value decomposition(SVD)significantly reduce the dimensionality of large-scale and high-dimensional datasets in an interpretable manner,and save most of the information in the data.Their idea is simple-to reduce the dimensionality of the data set while preserving as much “variability” as possible.Among them,“preserving as much variability as possible” can be transformed into finding new variables,and these variables are linear functions of those variables in the original dataset,which in turn maximizes the variance and is uncorrelated with each other.Finding such a new variable,the principal component(top singular value or eigenvalues),can be simplified to solving an eigenvalue / eigenvector problem.However,the traditional SVD and PCA methods usually require the complete transfer of the entire dataset at each iteration,resulting in high computational complexity for each iteration under large-scale data.Each iteration of the stochastic method requires updating all the coordinates of the selected sample.Or the theoretical analysis is complex and there is no parallel accelerated version;And nowadays a large amount of data contains sensitive information such as financial and medical data,so differential privacy PCA and differential privacy SVD issues have been extensively studied.We hope to develop algorithms that analyze the data structure and achieve low-rank approximation of data while protecting the privacy of data information.Differential privacy PCA and SVD methods based on deterministic methods have high iteration complexity,privacy protection utility boundaries and required noise levels that need to be further improved.Therefore,this article focuses on the above problems from two aspects: stochastic optimization and differential privacy.· Taking the SVD method as an example,for the deterministic method of solving SVD,all samples are used in each iteration,so the iteration complexity is high,and the stochastic methods use all the coordinates of the sample.We propose a more efficient semi-stochastic block coordinate descent method for solving SVD and two asynchronous and parallel methods for large-scale data.The proposed algorithm is analyzed theoretically in detail;finally verified on real data sets and images of different sizes.Firstly,the main idea of semi-stochastic block coordinate descent algorithm is as follows: in each inner loop iteration,only a small block of coordinates of a randomly selected sample is updated and calculated,and combined with the regular full gradient calculation of the reference point of the outer loop to achieve gradual variance Reduced,this calculation method is more suitable for large-scale data,especially high-dimensional data sets;then we propose asynchronous parallel algorithms for large-scale dense datasets and sparse datasets.The dense asynchronous algorithm uses multi-threaded parallel processing to process the full gradient of the outer loop reference point and the parameter update calculation of the inner loop to further reduce the algorithm operation time and achieve near linear acceleration.The sparse asynchronous algorithm also introduces a sparse approximation operator to sparse the full gradient of the reference point,and only updates the non-zero coordinates of the sample on the basis of parallel computing,making full use of the data sparsity to achieve near linear acceleration.· The stochastic methods are more suitable for solving samples and high-dimensional data due to their low iteration complexity,and the method of reducing random variance is more popular.However,the convergence analysis of this type of algorithm is more complicated,so we propose a loopless stochastic variance reduction algorithm for solving PCA and SVD problems.Its main idea is to replace the original outer loop mechanism with the probability of tossing a coin,and use the probability p to determine whether to calculate the full gradient of the reference point in each iteration.The single-layer form not only makes the algorithm easier to understand,it also makes its convergence analysis easier.Our experimental results on real data sets also confirm that the loopless method has better performance,can reduce the number of iterations,and the selection of probability p is more robust.· For differential privacy PCA and differential privacy SVD problems,most of the original methods are based on deterministic methods such as power iteration,which have high computational complexity and a relatively large level of added noise.Taking PCA as an example to solve these problems,we propose two random differential privacy PCA mechanisms.They are: stochastic differential privacy PCA and stochastic variance reduction differential privacy PCA.In each iteration,a sample is randomly selected and gradient perturbation is used for adding noise,our theoretical proofs and experiments have proved that their utility boundary is tighter and the magnitude of added noise is smaller.In particular,the stochastic variance reduction mechanism is more stable than the random mechanism experiment,and the privacy protection performance is better.
Keywords/Search Tags:principal component analysis, singular value decomposition, semi-stochastic variance reduction, loopless variance reduction, differential privacy
PDF Full Text Request
Related items