Font Size: a A A

Variable Selection And Sparse Regularization In High-dimensional Models

Posted on:2014-03-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y X WanFull Text:PDF
GTID:1310330398455391Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Variable selection is an important part of statistical modeling. To be able to analyze the problem fully, people are always to collect as much as possible of the variable associated with the research questions. If there are too many vari-ables, it not only makes the model more complex but also the explain effects and predictive ability of the model will be reduced in the modeling process. Thus variable selection is an extremely important issue in the statistics. It can not on-ly improve the predictive effect of the model, but also enables us to understand better the internal relations of the data itself. Especially in the high-dimensional data, how to extract the relevant characteristics or variable from numerous in-formation which is a key to the statistical modeling. Thus, the variable selection of high-dimensional data has become one of the hot issues of high-dimensional data analysis. The most effective and commonly used means of high-dimensional data sparse is sparse regularization method based on the penalty function, it can simultaneously select variable and estimate parameter. Therefore, the s-parse regularization method of the high-dimensional data has very important theoretical significance and application value. We study the variable selection of high-dimensional model in aspect of regularization term construction, regulariza-tion parameter selection, algorithm design and asymptotic theory research in this paper. The main research work and results are as follows:First, we construct the fractional absolute differentiable(FAD) concave penal-ty function for variable selection and parameter estimation. Furthermore, with proper choice of regularization parameters, we show that the proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well as if the correct submodel were known. We use local quadratic approx-imation (LQA) algorithm for the regular model. Simulation study shows that our proposed FAD regularization method has a smaller model error and higher prediction accuracy compared the LASSO, SCAD and MCP method.Second, we propose a weighted LAD-SCAD regularization method that will resist to the heavy-tailed errors and outliers in explanatory variables in the lin-ear regression model. The weighted least absolute deviation (WLAD) regression estimation method and the smoothly clipped absolute deviation (SCAD) penalty are combined to achieve robust parameter estimation and variable selection in regression simultaneously. For the choice of the weight function, we construct weight function based on the concept of "decontamination subset". In theory, we first prove the LAD-SCAD estimator has oracle properties for the linear re-gression model with a diverging number of parameters, and then the properties of the WLAD-SCAD estimators are investigated. We use LQA algorithm for WLAD-SCAD estimator and select the regularization parameter by BIC criteria.Third, we propose an exponential penalty function-EXP penalty to contin-uous approximation Lo penalty. The EXP penalized least squares procedure is shown to consistently select the correct model and is asymptotically normal. We propose a modified BIC (MBIC) tuning parameter selection method for EXP and show that it consistently identifies the correct model, while allowing the number of variables to diverge. EXP is efficiently implemented using a coordinate descent (CD) algorithm and iterative reweighted LASSO (IRL)algorithm. Numerical sim-ulation and case study show that our proposed method has a more capable of variable selection and has more precise degree for parameter estimation.Finally, We propose and study a unified approach via double penalized least squares, retaining good features of both variable selection and model estimation in the framework of partially linear models. The proposed method is distinguished from others in that the penalty functions combine the L1penalty coming from wavelet thresholding in the nonparametric component with the SCAD penalty in the parametric component. Simulations are used to investigate the performance of the proposed estimator in various settings, illustrating its effectiveness for simultaneous variable selection as well as estimation.
Keywords/Search Tags:High-dimensional data, Variable selection, Sparse regularization, Penalty function, Oracle property, Linear regression model, High-dimensionalpartially linear models
PDF Full Text Request
Related items