Font Size: a A A

Regularized methods for high-dimensional and bi-level variable selection

Posted on:2010-08-01Degree:Ph.DType:Thesis
University:The University of IowaCandidate:Breheny, Patrick JohnFull Text:PDF
GTID:2440390002472170Subject:Biology
Abstract/Summary:
Many traditional approaches to statistical analysis cease to be useful when the number of variables is large in comparison with the sample size. Penalized regression methods have proved to be an attractive approach, both theoretically and empirically, for dealing with these problems. This thesis focuses on the development of penalized regression methods for high-dimensional variable selection. The first part of this thesis deals with problems in which the covariates possess a grouping structure that can be incorporated into the analysis to select important groups as well as important members of those groups. I introduce a framework for grouped penalization that encompasses the previously proposed group lasso and group bridge methods, sheds light on the behavior of grouped penalties, and motivates the proposal of a new method, group MCP.;The second part of this thesis develops fast algorithms for fitting models with complicated penalty functions such as grouped penalization methods. These algorithms combine the idea of local approximation of penalty functions with recent research into coordinate descent algorithms to produce highly efficient numerical methods for fitting models with complicated penalties. Importantly, I show these algorithms to be both stable and linear in the dimension of the feature space, allowing them to be efficiently scaled up to very large problems.;In the third part of this thesis, I extend the idea of false discovery rates to penalized regression. I show how the Karush-Kuhn-Tucker conditions describing penalized regression estimates provide testable hypotheses involving partial residuals, thus connecting the previously disparate fields of multiple comparisons and penalized regression. I then propose two approaches to estimating false discovery rates for penalized regression methods and examine the accuracy of these approaches.;Finally, the methods from all three sections are studied in a number of simulations and applied to real data from microarray and genetic association studies.
Keywords/Search Tags:Methods, Penalized regression
Related items