Font Size: a A A

Penalization Methods for Group Identification and Variable Selection in Models with Correlated Predictors

Posted on:2011-10-09Degree:Ph.DType:Dissertation
University:North Carolina State UniversityCandidate:Sharma, Dhruv BhushanFull Text:PDF
GTID:1460390011471018Subject:Statistics
Abstract/Summary:PDF Full Text Request
This dissertation consists of two projects related to the study and development of penalization methods for group identification and variable selection in models with correlated predictors. Statistical procedures for variable selection have become integral elements in any analysis. Successful procedures are characterized by high predictive accuracy, yielding interpretable models while retaining computational efficiency. Penalized methods that perform coefficient shrinkage have been shown to be successful in many cases. Models with correlated predictors are particularly challenging to tackle and these are the main focus of this dissertation. In the first part of this dissertation we focus on developing a penalization method for regression models. We propose a penalization procedure that performs variable selection while clustering groups of predictors automatically. The oracle properties of this procedure including consistency in group identification are also studied. An efficient algorithm based on a quadratic approximation is proposed. The procedure compares favorably with existing selection approaches in both prediction accuracy and model discovery, while retaining its computational efficiency.;In the second part we focus on variable selection in high dimensional binary classification problems. Gene selection in studies of disease classification using gene expression data is a challenging problem due to the "high dimensional low sample size" nature of the data. Support vector machines are a classification tool with successful classification performance in studies of "high dimensional low sample size" data that have recently been modified to perform simultaneous gene selection and disease classification. Such studies are difficult to analyze since many genes that predict disease are often also correlated. To this end we propose a penalization approach that smoothes coefficients together, setting them as equal, while eliminating redundant variables, thus aiding in the classification of disease. This approach is shown to be superior in many cases where genes are correlated. Additional advantages of using this method over existing methods include the data adaptive nature of the penalty and the computational conveniences of the method, including an easily applicable algorithm. The procedure compares favorably with existing selection approaches in both classification accuracy and model discovery in simulation studies and the analysis of microarray gene expression cancer classification data, while retaining its computational efficiency.
Keywords/Search Tags:Variable selection, Models with correlated, Penalization, Methods, Identification, Classification, Computational efficiency, Data
PDF Full Text Request
Related items