Font Size: a A A

Variable Screening For Statistical Models With Ultrahigh Dimensional Data

Posted on:2018-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:1360330542468361Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Ultrahigh dimensional data occurs widely in biomedicine,economic and finance,ac-tuarial science of insurance,reliability engineering and other fields.Due to the curse of dimensionality,the traditional variable selection methods can not implement statistical accuracy.Ultrahigh dimensional data is the collected data dimensionality p that is al-lowed to diverge at nonpolynomial(NP)rate with the sample size n,namely logp=O(nk)(0<K<1).Since the singularity of the design matrix,the variable signals can not derive by least squares estimation.With the sparsity assumption,variable shrinkage methods including LASSO,SCAD,Adaptive LASSO and MCP,and so on,can not select variable accuracy.Thus variable selection with ultrahigh dimensional data becomes a focused issue for statisticiansTo tackle these problems problems,Fan and Lv(2008)[38]introduced a sure inde-pendence screening(SIS)method to select important variables in ultra-high dimensional linear regression models via marginal correlation learning.Extending the ideas of Fan and Lv(2008)[38]this dissertation considers the problem of variable screening includ-ing:quantile vary coefficient model via marginal correlation learning;empirical likelihood variable screening;nonparametric and semiparametric model kernel regression estimation variable screening;Gini correlation variable screening;sequential Lasso variable screening for general linear model.In section 2,we consider quantile regression with varying coefficient model.we extended the marginal screening methods to examine and select variables by ranking a measure of nonparametric marginal contributions of each covariate given the exposure variable.Spline approximations are employed to model marginal effects and select the set of active variables in quantile-adaptive framework.This ensures the sure screening property in quantile-adaptive varying-coefficient model.Numerical studies demonstrate that the proposed procedure works well for heteroscedastic data.In section 3,we investigate the marginal empirical likelihood screening methods for ultrahigh dimensional additive models.The proposed nonparametric screening method selects variables by ranking a measure of the marginal empirical likelihood ratio evaluated at zero in order to differentiate the contributions of each covariate made to the response variable.We show that,under some mild technical conditions,the proposed marginal empirical likelihood screening method has a sure screening property.And the extent to which the dimensionality can be reduced is also explicitly quantified.We also propose a data-driven thresholding and an iterative marginal empirical likelihood method to enhance the finite sample performance for fitting sparse additive models.Simulation results and real data analysis demonstrate that the proposed methods work competitively and perform better than the competing methods in a heteroscedastic scenario.In section 4,The new feature screening procedure base on conditional expectation which is used to differentiate whether an explanatory variable contributes to a response variable or not,without requiring a specific parametric form of the underlying data mod-el.We estimate the marginal conditional expectation by kernel regression estimator.The proposed method is showed to have sure screen property.We propose an iterative kernel estimator algorithm to reduce the ultrahigh dimensionality to an appropriate scale.Sim-ulation results and real data analysis demonstrate the proposed method works well and performs better than competing methods.In section 5,we consider the iterative sequential interation lasso(SILasso)variable selection for generalized linear model with ultrahigh dimensional feature space.The SILasso selects features by estimated parameter sequentially iteratively for the second order approximation of likelihood function where the features selected depend on tune parameters from large to small which is acquired by EBIC.The procedure stops when EBIC reaches a minimum.Simulation study demonstrates that the new method is a desirable approach over other methods.
Keywords/Search Tags:Ultrahigh dimensional data, Quantile regression, Variable coefficient model, Variable screening, Variable selection, Variable shrinkage methods, Additive model, Gini correlation, General linear model, Empirical likelihood rates, Sure screen property
PDF Full Text Request
Related items