Complete Least Squares: A New Variable Screening and Selection Method | | Posted on:2013-01-29 | Degree:Ph.D | Type:Thesis | | University:North Carolina State University | Candidate:Reyes, Eric M | Full Text:PDF | | GTID:2450390008484298 | Subject:Statistics | | Abstract/Summary: | PDF Full Text Request | | Variable selection methods have been the focus of much research, and many new methods have been established. These methods, however, can fail to correctly distinguish the informative (nonzero coefficient) and uninformative (zero coefficient) predictors when the number of predictors exceeds the sample size. In order to improve the performance of variable selection methods in these high-dimensional settings, a screening step can be used prior to selection. The screening step reduces the pool of potential predictors by eliminating those predictors with the least evidence of being informative.;We introduce a new method for estimating the parameters in a linear model, called Complete Least Squares (CLS), and investigate its potential as a screening technique. The CLS estimator minimizes a weighted sum of the least squares objective functions for all possible linear models. We show that the CLS estimator is related to, and competitive with, the ridge regression estimator. We develop an ordering of the variables based on the CLS estimator and propose a screening technique based on this ordering. Using simulation studies, we show that screening via CLS is generally competitive with other methods found in the literature, and results in a more accurate ordering in some settings.;In the second part of this thesis, we consider variable selection for two-stage studies. During the first phase of a two-stage design, the response and (possibly) some predictors are measured for all subjects in the study. A subset of these subjects are then sampled into a second phase in which additional predictors are measured. The probability of a subject being sampled into the second phase can depend on the response and predictors observed in the first phase. By design, the subjects not sampled into the second phase are subject to missing data. We review two approaches for conducting valid inference for two-stage studies. We then integrate these methods with the forward addition sequence to develop two variable selection methods. Using simulation studies, we show that our method based on the estimator of Breslow and Cain (1988) is very competitive. As two-stage studies are a special case of missing data, we generalize our variable selection methods to the monotone missing data problem. | | Keywords/Search Tags: | Selection, Variable, Least squares, Screening, New, CLS estimator, Missing data, Studies | PDF Full Text Request | Related items |
| |
|