Penalization Methods for Group Identification and Variable Selection in Models with Correlated Predictors

Posted on:2011-10-09

Degree:Ph.D

Type:Dissertation

University:North Carolina State University

Candidate:Sharma, Dhruv Bhushan

Full Text:PDF

GTID:1460390011471018

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

This dissertation consists of two projects related to the study and development of penalization methods for group identification and variable selection in models with correlated predictors. Statistical procedures for variable selection have become integral elements in any analysis. Successful procedures are characterized by high predictive accuracy, yielding interpretable models while retaining computational efficiency. Penalized methods that perform coefficient shrinkage have been shown to be successful in many cases. Models with correlated predictors are particularly challenging to tackle and these are the main focus of this dissertation. In the first part of this dissertation we focus on developing a penalization method for regression models. We propose a penalization procedure that performs variable selection while clustering groups of predictors automatically. The oracle properties of this procedure including consistency in group identification are also studied. An efficient algorithm based on a quadratic approximation is proposed. The procedure compares favorably with existing selection approaches in both prediction accuracy and model discovery, while retaining its computational efficiency.;In the second part we focus on variable selection in high dimensional binary classification problems. Gene selection in studies of disease classification using gene expression data is a challenging problem due to the "high dimensional low sample size" nature of the data. Support vector machines are a classification tool with successful classification performance in studies of "high dimensional low sample size" data that have recently been modified to perform simultaneous gene selection and disease classification. Such studies are difficult to analyze since many genes that predict disease are often also correlated. To this end we propose a penalization approach that smoothes coefficients together, setting them as equal, while eliminating redundant variables, thus aiding in the classification of disease. This approach is shown to be superior in many cases where genes are correlated. Additional advantages of using this method over existing methods include the data adaptive nature of the penalty and the computational conveniences of the method, including an easily applicable algorithm. The procedure compares favorably with existing selection approaches in both classification accuracy and model discovery in simulation studies and the analysis of microarray gene expression cancer classification data, while retaining its computational efficiency.

Keywords/Search Tags:

Variable selection, Models with correlated, Penalization, Methods, Identification, Classification, Computational efficiency, Data

PDF Full Text Request

Related items

1	The Application Of Logistic Regression With Penalization In Financial Distress Prediction Of Listed Enterprises
2	The Parameter Estimation And Variable Selection In High Dimensional Collinearity Models
3	Variable Selection Methods In Statistical Models For Survival Data
4	Statistical Methods for Multivariate and Correlated Dat
5	Model Selection For Analysizing High-dimensional, Strongly Correlated Data
6	Resampling methods for variable selection and classification: Applications to genomics
7	Structure Identification,Variable Selection And Robust Estimation For Some Semiaparametric Models With High Dimensional Complicated Data
8	Research Of Group Variable Selection Based On Adaptive Elastic Net With Strongly Correlated Data
9	Bayesian Variable Selection for High Dimensional Data Analysis
10	Study Of Dna Microarray Data Of Variable Selection Methods