Penalized Regression Estimation Based On Parallel Computing

Posted on:2022-01-27

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S Zhang

Full Text:PDF

GTID:1480306347451954

Subject:Particle Physics and Nuclear Physics

Abstract/Summary:

PDF Full Text Request

In the dissertation,we focus on the algorithms for certain regression analysis of largescale data.We consider those cases where traditional statistical methods are no longer efficient or even feasible due to the scale of data.Furthermore,large-scale data can be completely stored due to the development of technologies.Since data are phisically stored in different nodes,new algorithms are necessary for traditional statistical methods.Two types of problems are considered in the dissertation.First,we work on how to solve high-dimensional or complex-structure problems at a single node,when the corresponding memory allows some parallel computing for the original classic statistical methods.Second,distributed algorithms are proposed to process some large-scale or high-dimensional data.In Chapter 1,we introduce the related background,including quantile regression,alternating direction method of multipliers algorithm（ADMM）,accelerated failure time model（AFT）,and subgroup analysis,together with the presentation structure.In Chapter 2,we propose DisQADMM,a distributed ADMM algorithm for the penalized quantile regression with a broad class of penalties.With some fine designs,all the subproblems in each iteration within our DisQADMM have closed forms based on the KKT condition,which demonstrates the promising performance.As by-products,we not only give an efficient distributed quantile regression solver in the low-dimensional cases by setting the penalty term as zero,but also provide an non-distributed but unified solver named QADMM for many high-dimensional quantile regression models,which can be of their own interests.We conduct an intensive numerical study to assess the performances of QADMM and several state-of-art methods,and that of DisQADMM on the SPARK system.In the real case study,we use QADMM to carry out the estimation procedure for the quantile regression coupled with the fussed group MCP penalty to deionize images under impulsive noises.Related codes are available at https://github.com/ponda-donut/QRADMM.In Chapter 3,we develop a constructive approach for l0 penalized estimation in the sparse AFT model with high-dimensional covariates.Our proposed method is based on Stute’s weighted least squares criterion combined with l0 penalization.This method is a computational algorithm that generates a sequence of solutions iteratively,based on active sets derived from primal and dual information and root finding according to the KKT conditions.We refer to the proposed method as AFT-SDAR（for support detection and root finding）.An important aspect of our theoretical results is that we directly concern the sequence of solutions generated based on the AFT-SDAR algorithm.We prove that the estimation errors of the solution sequence decay exponentially to the optimal error bound with high probability,as long as the covariate matrix satisfies mild regularity condition which is necessary and sufficient for model identification even in the setting of high-dimensional linear regression.We also proposed an adaptive version of AFT-SDAR,or AFT-ASDAR,which determines the support size of the estimated coefficient in a data-driven fashion.We conduct simulation studies to demonstrate the superior performance of the proposed method over the lasso and MCP in terms of accuracy and speed.We also apply the proposed method to a real data set to illustrate its application.In Chapter 4,we consider subgroup analysis in a distributed manner.With the development of sciences and technologies,governments and institutions are vigorously promoting health information exchange（HIE）for precise medical treatments,where the analysis of heterogeneous treatment effects has become important.In this paper,we focus on identifying the subgroups by combining data in a distributed storage system under the condition of ensuring the privacy and security of data.We propose a DisSRADMM algorithm based on the alternating direction method of multipliers.This method cannot only deal with large-scale data but also tolerate unbalanced subgroups,which is potentially useful in detecting some unknown infectious diseases in its applications.Numerical studies are used to assess the performance of the proposed method.Our framework is suitable for solving a series of regression problems with different heterogenous structures,even in distributed circumstances.In Chapter 5,we summarize the results presented in the dissertation,and provide some discussions on possible research in the future.

Keywords/Search Tags:

Penalized Estimation, DisQADMM, Big data, Quantile regression, Distributed computation, Alternating Direction Method of Multipliers, DisSRADMM, privacy preservation, subgroup analysis, Censored data, l0 penalization, KKT condition

PDF Full Text Request

Related items

1	Distributed Quantile Regression Algorithms And Applications
2	Statistical Inference Based On Quantile Regression Models And Its Application With Complex Data
3	Fast Sparse Multinomial Logistic Regression And Distributed Parallelism
4	Parameter Estimation And Inference For Regression Models With Censored Data
5	Statistical Inference Of Semiparametric Quantile Regression Model With Interval Censored Data
6	Quantile Regression Estimation Of Partially Varying-coefficient Linear Model With Right Censored Data
7	Algorithm Research On Constrained Optimization Problems Governed By Elliptic Equations
8	A Study On Some Problems Of Alternating Direction Method Of Multipliers
9	Some Theoretical Research Of The Generalized Alternating Direction Method Of Multipliers
10	Semi-Proximal Alternating Direction Method Of Multipliers For Sparse Inverse Covariance Matrices Estimation