Font Size: a A A

Generalized linear models with regularization

Posted on:2007-04-11Degree:Ph.DType:Thesis
University:Stanford UniversityCandidate:Park, Mee YoungFull Text:PDF
GTID:2440390005964331Subject:Statistics
Abstract/Summary:
Penalizing the size of the coefficients is a common strategy for robust modeling in regression/classification with high-dimensional data. This thesis examines the properties of the L2 norm and the L1 norm constraints applied to the coefficients in generalized linear models (GLM).; In the first part of the thesis, we propose fitting logistic regression with a quadratic penalization on the coefficients for a specific application of modeling gene-interactions. Logistic regression is traditionally a popular way to model a binary response variable; however, it has been criticized due to a difficulty of estimating a large number of parameters with a small number of samples, which is a typical situation in gene-interaction models. We show that the slight modification of adding an L2 norm constraint to logistic regression makes it possible to handle such data and yields reasonable prediction performance. We implement it in conjunction with a forward stepwise variable selection procedure.; We also study generalized linear models with an L 1 norm constraint on the coefficients, focusing on the regularization path algorithm. The L1 norm constraint yields a sparse fit, and different sets of variables are selected according to the level of regularization; therefore, it is meaningful to track how the active set changes along the path and to choose the optimal model complexity. Following the idea of the Lars-Lasso path proposed by Efron, Hastie, Johnstone & Tibshirani (2004), we generalize the algorithm to the piecewise smooth coefficient paths for GLM. We use the predictor-corrector scheme to trace the nonlinear path. Furthermore, we extend our procedure to fit the Cox proportional hazards model, again penalizing the L1 norm of the coefficients.; For the final part of the thesis, having studied the forward stepwise variable selection procedure with L2 penalized logistic regression and the L1 regularization path algorithm for GLM, we then merge these two earlier approaches. That is, we consider several regularization path algorithms with grouped variable selection for gene-interaction models, as we have fit with stepwise logistic regression. We examine group-Lars/group-Lasso introduced in Yuan & Lin (2006) and also propose a new version of group-Lars. All these regularization methods with an automatic grouped variable selection are compared to our stepwise logistic regression scheme, which selects groups of variables in a greedy manner.
Keywords/Search Tags:Generalized linear models, Logistic regression, Variable selection, Regularization, L1 norm, Coefficients, Stepwise
Related items