Font Size: a A A

Penalized Regression Estimation Based On Parallel Computing

Posted on:2022-01-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:1480306347451954Subject:Particle Physics and Nuclear Physics
Abstract/Summary:PDF Full Text Request
In the dissertation,we focus on the algorithms for certain regression analysis of largescale data.We consider those cases where traditional statistical methods are no longer efficient or even feasible due to the scale of data.Furthermore,large-scale data can be completely stored due to the development of technologies.Since data are phisically stored in different nodes,new algorithms are necessary for traditional statistical methods.Two types of problems are considered in the dissertation.First,we work on how to solve high-dimensional or complex-structure problems at a single node,when the corresponding memory allows some parallel computing for the original classic statistical methods.Second,distributed algorithms are proposed to process some large-scale or high-dimensional data.In Chapter 1,we introduce the related background,including quantile regression,alternating direction method of multipliers algorithm(ADMM),accelerated failure time model(AFT),and subgroup analysis,together with the presentation structure.In Chapter 2,we propose DisQADMM,a distributed ADMM algorithm for the penalized quantile regression with a broad class of penalties.With some fine designs,all the subproblems in each iteration within our DisQADMM have closed forms based on the KKT condition,which demonstrates the promising performance.As by-products,we not only give an efficient distributed quantile regression solver in the low-dimensional cases by setting the penalty term as zero,but also provide an non-distributed but unified solver named QADMM for many high-dimensional quantile regression models,which can be of their own interests.We conduct an intensive numerical study to assess the performances of QADMM and several state-of-art methods,and that of DisQADMM on the SPARK system.In the real case study,we use QADMM to carry out the estimation procedure for the quantile regression coupled with the fussed group MCP penalty to deionize images under impulsive noises.Related codes are available at https://github.com/ponda-donut/QRADMM.In Chapter 3,we develop a constructive approach for l0 penalized estimation in the sparse AFT model with high-dimensional covariates.Our proposed method is based on Stute's weighted least squares criterion combined with l0 penalization.This method is a computational algorithm that generates a sequence of solutions iteratively,based on active sets derived from primal and dual information and root finding according to the KKT conditions.We refer to the proposed method as AFT-SDAR(for support detection and root finding).An important aspect of our theoretical results is that we directly concern the sequence of solutions generated based on the AFT-SDAR algorithm.We prove that the estimation errors of the solution sequence decay exponentially to the optimal error bound with high probability,as long as the covariate matrix satisfies mild regularity condition which is necessary and sufficient for model identification even in the setting of high-dimensional linear regression.We also proposed an adaptive version of AFT-SDAR,or AFT-ASDAR,which determines the support size of the estimated coefficient in a data-driven fashion.We conduct simulation studies to demonstrate the superior performance of the proposed method over the lasso and MCP in terms of accuracy and speed.We also apply the proposed method to a real data set to illustrate its application.In Chapter 4,we consider subgroup analysis in a distributed manner.With the development of sciences and technologies,governments and institutions are vigorously promoting health information exchange(HIE)for precise medical treatments,where the analysis of heterogeneous treatment effects has become important.In this paper,we focus on identifying the subgroups by combining data in a distributed storage system under the condition of ensuring the privacy and security of data.We propose a DisSRADMM algorithm based on the alternating direction method of multipliers.This method cannot only deal with large-scale data but also tolerate unbalanced subgroups,which is potentially useful in detecting some unknown infectious diseases in its applications.Numerical studies are used to assess the performance of the proposed method.Our framework is suitable for solving a series of regression problems with different heterogenous structures,even in distributed circumstances.In Chapter 5,we summarize the results presented in the dissertation,and provide some discussions on possible research in the future.
Keywords/Search Tags:Penalized Estimation, DisQADMM, Big data, Quantile regression, Distributed computation, Alternating Direction Method of Multipliers, DisSRADMM, privacy preservation, subgroup analysis, Censored data, l0 penalization, KKT condition
PDF Full Text Request
Related items