Font Size: a A A

Variable Screening Of Regression Models With Missing Data At Random

Posted on:2020-10-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L XiaFull Text:PDF
GTID:1487305762462204Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of data collection and storage technology,biostatistics,econometrics,bioinformatics,sociology and other fields collect a large amount of complicated data.High dimensional data is a typical one,and the number of variables(dimension)in the observed biological medicine biological medicine data is allowed to grow much faster than sample size,which makes the traditional statistical inference method no longer applicable.In high-dimensional data,not only missing data are often encountered,but also a lot of information is noise rather than useful signals.Therefore,how to select all important features or variables from a large number of features or variables without losing information to achieve effective dimensionality reduction has become the key to deal with high-dimensional data.In recent years,many new dimensionality reduction methods such as variable selection and variable(feature)screening have been proposed.The research results mainly focus on the variable selection of high-dimensional parametric model and the variable screening with complete data,while there are few studies about the variable selection of ultra-high-dimensional semi-parametric models and the variable screening of ultra-high-dimensional models(the dimension of prediction variables is allowed to grow at a certain exponential rate with the sample size)with missing data.To this end,the variable selection of ultra-high partial linear model will be studied from the aspects of algorithm design and asymptotic theory based on penalty function,and the variable screening problem of ultra-high dimensional model will also be studied from the aspects of variable screening method,adjustment parameter selection and variable screening properties with response variable or covariables missing at random in this paper.The research content of this paper is summarized as follows:1.Parameter estimation is studied based on profile least squares and kernel estimation when the dimension of covariates diverges with the sample size,the existence and asymptotic normality of parameter estimation are established under some regularity conditions.Variable selection for ultra-high partial linear model has studied based on ConCave Convex decomposition procedure(CCCP)of nonconvex penalized function and alternating direction method of multipliers(ADMM).The CCCP-ADMM algorithm is designed,which is insensitive to the initial value and fast in calculation.The oracle property is proved for a general class of nonconvex penalty functions in the presence of ultra-high dimensional covariates under some regular conditions.Simulation studies are investigated to illustrate the feasibility and effectiveness of the proposed method and the designed algorithm.2.A feature screening method is proposed based on profile marginal estimating equations and kernel imputation technology for ultra-high dimensional partial linear models with random missing response.It doesn't require specific form of the propensity score function in this method and is robust to the false assumption of the missing mechanism.The ranking consistency property and the sure screening property are shown under some regularity conditions.Simulation studies and a real data analysis are investigated to illustrate the effectiveness of the proposed screening procedure3.A robust variable screening method is proposed based on the distance correlation coefficient and kernel imputation method with response missing at random and the low-dimensional missing probability model.The proposal is a model-free method which does not assume any specification of a regression model.In addition,a modified adjustment parameter selection algorithm is proposed based on the maximum ratio criterion,which can select a smaller model containing all the real variables in missing mechanism model,and effectively avoid the problem of "curse of dimensionality" using kernel imputation method.The sure screening properties are shown under some regularity conditions.Simulation studies and a real data analysis are investigated to illustrate the feasibility and effectiveness of the proposed method and the designed algorithm.4.A robust variable screening method is proposed based on the distance correlation coefficient and inverse probability weighting method with response and prediction variables missing at random and sparse ultra-high dimensional Logistic missing probability model.The proposal is a model-free method which does not assume any specification of a regression model.In addition,the sure screening property are shown under some regularity conditions.Simulation studies are investigated to illustrate the feasibility and effectiveness of the proposed method.
Keywords/Search Tags:High-dimensional data, Missing at random, Variable selection, Variable(Feature) screening, Distance correlation
PDF Full Text Request
Related items