Variable Selection For Generalized Linear Model With Highly Correlated Predictors

Posted on:2022-10-17

Degree:Master

Type:Thesis

Country:China

Candidate:W T Wang

Full Text:PDF

GTID:2507306764994869

Subject:Journalism and Media

Abstract/Summary:

PDF Full Text Request

The generalized linear model is a crucial promotion of the classical linear model.It extends the distribution of response variables to the exponential distribution family,so that it can effectively process discrete data,making the model widely used in medicine,economics and other fields.With the rapidly development of big data,the data that can be availably obtained and stored becomes more and more abundant,and the application scenarios of generalized linear models are becoming more and more diverse.However,due to the high-dimensional characteristics of data,traditional statistical methods and theories are difficult to apply.The penaltybased variable selection method can select variables and estimate coefficients simultaneously,and obtain a more explanatory model,which overcomes the difficulties of high-dimensional data.However,when the covariates are highly correlated,the irrepresentable condition is not satisfied,resulting in variable selection inconsistently.Although there are many literatures on variable selection problem with strong correlation of covariates,there are few studies on the problem of generalized linear model.Therefore,it is of great theoretical significance and practical value to study the variable selection problem for generalized linear model with highly correlated predictors.In this thesis,inspired by the semi-standard partial covariance（Semi-standard PArtial Covariance,Abbreviated as SPAC）method proposed by Xue and Qu（2017）under the framework of linear models,we propose the generalized linear model semi-standard partial covariance（Generalized Linear Model Semi-standard PArtial Covariance,Abbreviated as GLM-SPAC）method for sparse generalized linear models with highly correlated predictors.SPAC can reduce the correlation effect from other covariates while incorporating the magnitude of the coefficients.Firstly,the GLM-SPAC estimates the diagonal elements of the precision matrix to gain the relationship between SPAC and regression parameters.Secondly,the method replaces the parameters in the penalty likelihood function to penalize SPACs.Finally,obtain the estimator of regression parameters based on the estimator of SPACs.Furthermore it is shown that,under some regularity conditions,the proposed method with the Lasso penalty（SPAC-Lasso）enjoys the strong sign consistency in high-dimensional settings,even if the irrepresentable condition does not hold.Simulation studies and a real data analysis are also carried out to assess the performance of our proposed methods.

Keywords/Search Tags:

Irrepresentable condition, Generalized linear model, Variable selection, Model selection consistency, Lasso

PDF Full Text Request

Related items

1	Variable Selection Of Complex Data Joint Model Based On Improved Lasso Method
2	Research On Improved Linear And Nonlinear Variable Selection Methods
3	Fast,Adaptive And Selection-effective Variable Selection Methods For Artificial Neural Networks And Nonparametric Additive Models
4	Research On Sequential Adaptive Variables And Subject Selection
5	Theoretical Research And Empirical Analysis Of Variable Selection In Spatial Autoregressive Model
6	Variable Selection For Partially Linear Spatial Autoregressive Models With A Diverging Number Of Parameters
7	Estimation And Variable Selection For Function-on-scalar Linear Regression Model
8	Studies Of Some Statistical Issues In Censored Regression Model
9	Regression Analysis And Variable Selection For Generalized Odds Rate Model With Interval-censored Failure Time Data
10	Study On The Parameter Estimation And Robust Variable Selection For Linear Model