Font Size: a A A

Sparse Reduced Rank Regression By Dividing The Optimization

Posted on:2022-07-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:R P DongFull Text:PDF
GTID:1480306611955449Subject:Statistics
Abstract/Summary:PDF Full Text Request
Multi-response regression is one of the most statistical learning framework,which has been widely used in the natural language processing,recommender systems,biclustering analysis and other applications.Different from the univariate-response regression,the multi-response regression models multiple response variables using the same set of predictor variables and then estimates the coefficient matrix.Ho wever under the big data background,the large-scale sample size and massive of variables bring the challenges in the computation,estimation and interpretation of the model.Especially,both of the responses and predictors are massive in the high-dimensional multiresponse regression,which results in the coefficient matrix is too large to estimate.Thus it is urgent to design the computation-efficient and accurate estimation approach for the high-dimensional multi-response regression.In this paper,we study the estimation and prediction of the multi-response regression under the high dimensions.We propose two approaches for the estimation of the row-sparse coefficient matrix and co-sparse coefficient matrix(both of the row and column are sparse),and provide the theoretical guarantee for them.Chapter 1 first introduces the challenges of the high-dimensional statistics in the view of noise accumulation,pseudo correlation and computation efficiency.Then we introduce the sparse model assumption,which is widely used in the high-dimensional statistical inference.With the assumption,many statistical techniques based on the regularization track were proposed for the estimation and inference in high dimensions.These methodologies including both of theory and algorithm has achieved the excellent progress.In the last of this chapter,we state some progress in the high-dimensional multi-response regression,and point out there still is lack of theoretical guarantee for a computation-efficient estimation approach.It is urgent to design a fast method with the theoretical guarantee for the high-dimensional multi-response regression.In Chapter 2,we consider the multi-response regression with the row-sparse coefficient matrix and propose a computation-efficient method called as PEER to estimate the row-sparse matrix with low-rank structure.This approach is suitable to both of the large-scale responses and predictors.Motivated by sparse factor regression,we convert the multi-response regression into a set of univariate-response regressions,which can be efficiently implemented in parallel.Under some mild regularity conditions,we show that PEER enjoys nice sampling properties including consistency in estimation,prediction,and variable selection.Extensive simulation studies show that our proposal compares favorably with several existing methods in estimation accuracy,variable selection,and computation efficiency.Moreover,the factorization restricted in the sparse rows and columns of a large matrix is also fundamental in modern statistical learning.In particular,the sparse singular value decomposition and its variants have been utilized in multivariate regression,factor analysis,biclustering,vector time series modeling and among others.The appeal of this factorization is owing to its power in discovering a highly-interpretable latent association network,either between samples and variables or between responses and predictors.However,many existing methods are either ad hoc without a general performance guarantee,or are computationally in-tensive,rendering them unsuitable for large-scale studies.Therefore in Chapter 3,we introduce a new algorithm called as CURE(co-sparse unit rank estimation)to estimate the co-sparse coefficient matrix with the unit rank(the rank of the matrix is one).Motivated by the stagewise learning framework,the proposed algorithm incrementally updates the complexity of model with an given step size and then gets a solution path of the coefficient matrix.Each solution in the path corresponds to an individual penalty parameter.Thus the algorithm is very suitable to the tuning parameter by the information criterion.In this chapter,we show that the solution of CURE will converge to the coordinate-wise optimal solution when the step size goes to zero.And we provide the computational complexity analysis for the each update in the proposed algorithm.Finally,the numerical studies demonstrate the algorithm.Chapter 4 introduces the deflation strategy,which divides the general multiresponse regression with the multiple rank into a set of co-sparse unit rank estimation problem.Thus we generalize CURE in Chapter 3 to the general co-sparse multiresponse regression.In this chapter,we introduce two kind of deflation strategies,where one is sequential and another is parallel.Moreover,the statistical convergence properties are studied for these two strategies.In the view of theory,the conditions of the sequential strategy is weaker than the parallel because the sequential one does not need any initial estimator.But the sequential estimation approach depends on the previous result,which regresses the current residual matrix on the predictors in each CURE algorithm.Therefore,the error will be accumulated layer by layer for the sequential approach.Compared with the sequential one,the parallel strategy divides the original problem into a set of parallel CUREs,and all of these subproblems can be solved in parallel.Thus the approach bypasses the trouble of error accumulation.Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of the two kinds of approaches.In the last chapter,we discuss the drawback of the stagewise learning and the importance of an adaptive step size in the stagewise algorithm.Moreover,there exists a gap between the solution of algorithm and the theoretical analysis for the multi-response regression.Filling up the gap is also the important work in future.In addition,there is lack of the literature to study the inference of the multi-response regression in high dimensions.The inference of eigenvectors/eigenvalues of the coefficient matrix is valuable to study in the future work.
Keywords/Search Tags:Factor regression, Low-rank matrix approximation, Variable Selection, Regularization technique, Stagewise estimation, Matrix completion
PDF Full Text Request
Related items