Font Size: a A A

High dimensional classification and variable selection

Posted on:2014-02-10Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Li, QuefengFull Text:PDF
GTID:1450390005493373Subject:Statistics
Abstract/Summary:
Recent advances in biotechnology and other disciplines have led to the generation of many high-dimensional data, which raises challenges to develop new statistical methodologies to handle them.;This dissertation focuses on two aspects of high-dimensional data inference: (1) classification based on high-dimensional covariates; (2) variable selection of high-dimensional linear regression model. Both aspects have great importance in high-dimensional data inference and are related with each other. Variable selection plays a critical rule to reduce the dimension of data. It usually boosts the signal to noise ratio and results in a simpler model that becomes much easier to interpret. Classification has many important applications in practice, such as face detection, hand-writing recognition, etc.;For the high-dimensional classification problem, I have developed a new Sparse Quadratic Discriminant Analysis (SQDA) approach, which extends the application of traditional low-dimensional Quadratic Discriminant Analysis. The theoretical properties of the new SQDA approach is thoroughly addressed. Simulation studies have been conducted to compare SQDA with many other well-known classifiers in the literature. This new approach has also been applied to analyze one dataset from a colon cancer study.;For the variable selection problem, a Regularized LASSO approach has been proposed, which alleviates the strong conditions for the classical LASSO method to perform well. It has been found that the new Regularized LASSO approach includes many other well-known variable selection methods as its special cases, which makes it a very general approach. The asymptotic properties of Regularized LASSO is thoroughly studied. It has been shown that the Regularized LASSO asymptotically identifies the correct model under mild assumptions. The new method has also been investigated through simulation studies, where it outperforms many other variable selection methods.
Keywords/Search Tags:Variable selection, Regularized LASSO, High-dimensional data, New, Classification
Related items