| The generalized linear model is a crucial promotion of the classical linear model.It extends the distribution of response variables to the exponential distribution family,so that it can effectively process discrete data,making the model widely used in medicine,economics and other fields.With the rapidly development of big data,the data that can be availably obtained and stored becomes more and more abundant,and the application scenarios of generalized linear models are becoming more and more diverse.However,due to the high-dimensional characteristics of data,traditional statistical methods and theories are difficult to apply.The penaltybased variable selection method can select variables and estimate coefficients simultaneously,and obtain a more explanatory model,which overcomes the difficulties of high-dimensional data.However,when the covariates are highly correlated,the irrepresentable condition is not satisfied,resulting in variable selection inconsistently.Although there are many literatures on variable selection problem with strong correlation of covariates,there are few studies on the problem of generalized linear model.Therefore,it is of great theoretical significance and practical value to study the variable selection problem for generalized linear model with highly correlated predictors.In this thesis,inspired by the semi-standard partial covariance(Semi-standard PArtial Covariance,Abbreviated as SPAC)method proposed by Xue and Qu(2017)under the framework of linear models,we propose the generalized linear model semi-standard partial covariance(Generalized Linear Model Semi-standard PArtial Covariance,Abbreviated as GLM-SPAC)method for sparse generalized linear models with highly correlated predictors.SPAC can reduce the correlation effect from other covariates while incorporating the magnitude of the coefficients.Firstly,the GLM-SPAC estimates the diagonal elements of the precision matrix to gain the relationship between SPAC and regression parameters.Secondly,the method replaces the parameters in the penalty likelihood function to penalize SPACs.Finally,obtain the estimator of regression parameters based on the estimator of SPACs.Furthermore it is shown that,under some regularity conditions,the proposed method with the Lasso penalty(SPAC-Lasso)enjoys the strong sign consistency in high-dimensional settings,even if the irrepresentable condition does not hold.Simulation studies and a real data analysis are also carried out to assess the performance of our proposed methods. |