Font Size: a A A

Variable Selection Based On Distribution-weighted Least Square Under The Multivariate Normal Mixture Distribution Assumption For Predictor Vector

Posted on:2020-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z C YeFull Text:PDF
GTID:2370330572980280Subject:statistics
Abstract/Summary:PDF Full Text Request
In the nonparametric or semiparametric regressions,when the dimension of the predictor vector is high,it is difficult to fit the regression function.Hence,a variable selection before fitting is necessary.However,as the assumption about the regression function is generally not so specific in the nonparametric or semiparametric regressions,none of the well-known variable selection methods(e.g.LASSO,SCAD,etc.)can be directly used in the nonparametric or semiparametric regressions Suppose that several linear combinations of the predictors can be found,without any assumption about the regression function form,such that given them,the response is conditionally independent with all the predictors,which means the relationship between the response and these combinations contains all the information about the regression.If that happens,we will be able to obtain the sparse combination coefficient vectors by the penalty methods for variable selection.Fortunately,there exist some approaches to obtain such linear combinations of indicators,which are called sufficient dimension reduction.How to select variables in the nonparametric or semiparametric regressions based on sufficient dimension reduction has already received some attentions,and some approaches have been proposed.However,these approaches depend on some distributional assumptions such as linear design condition,constant conditional variance,and so on.In practice,the violation of these distributional assumptions is quite possible.For example,the normal mixture distribution does not satisfy those assumptionsIn this thesis,we try to solve the issue of variable selection in the single-index models with the predictor vectors following the normal mixture distributions,based on the sufficient dimension reduction.The proposed variable selection method is based on the distribution-weighted least square approach,an efficient sufficient dimension reduction method.The basic idea is to extend the distribution-weighted least square method to the scenarios where the indicator vectors follow the normal mixture distributions.Based on that,an objective function is constructed,of which the sole minimizer is some nonzero vector in the central subspace.Then the sample version of this objective function is obtained by a resampling algorithm.By minimizing this objective function of sample version with a SCAD type penalty on it,we can get a shrinkage estimate of some nonzero vector in the central subspace and then the variable selection is done.The proposed methodology is illustrated by two simulated examples.
Keywords/Search Tags:Single-index model, Variable selection, normal mixture distribution, Shrinkage estimate, distribution-weighted least square
PDF Full Text Request
Related items