Generalized linear models with regularization

Posted on:2007-04-11

Degree:Ph.D

Type:Thesis

University:Stanford University

Candidate:Park, Mee Young

Full Text:PDF

GTID:2440390005964331

Subject:Statistics

Abstract/Summary:

Penalizing the size of the coefficients is a common strategy for robust modeling in regression/classification with high-dimensional data. This thesis examines the properties of the L2 norm and the L1 norm constraints applied to the coefficients in generalized linear models (GLM).; In the first part of the thesis, we propose fitting logistic regression with a quadratic penalization on the coefficients for a specific application of modeling gene-interactions. Logistic regression is traditionally a popular way to model a binary response variable; however, it has been criticized due to a difficulty of estimating a large number of parameters with a small number of samples, which is a typical situation in gene-interaction models. We show that the slight modification of adding an L2 norm constraint to logistic regression makes it possible to handle such data and yields reasonable prediction performance. We implement it in conjunction with a forward stepwise variable selection procedure.; We also study generalized linear models with an L 1 norm constraint on the coefficients, focusing on the regularization path algorithm. The L1 norm constraint yields a sparse fit, and different sets of variables are selected according to the level of regularization; therefore, it is meaningful to track how the active set changes along the path and to choose the optimal model complexity. Following the idea of the Lars-Lasso path proposed by Efron, Hastie, Johnstone & Tibshirani (2004), we generalize the algorithm to the piecewise smooth coefficient paths for GLM. We use the predictor-corrector scheme to trace the nonlinear path. Furthermore, we extend our procedure to fit the Cox proportional hazards model, again penalizing the L1 norm of the coefficients.; For the final part of the thesis, having studied the forward stepwise variable selection procedure with L2 penalized logistic regression and the L1 regularization path algorithm for GLM, we then merge these two earlier approaches. That is, we consider several regularization path algorithms with grouped variable selection for gene-interaction models, as we have fit with stepwise logistic regression. We examine group-Lars/group-Lasso introduced in Yuan & Lin (2006) and also propose a new version of group-Lars. All these regularization methods with an automatic grouped variable selection are compared to our stepwise logistic regression scheme, which selects groups of variables in a greedy manner.

Keywords/Search Tags:

Generalized linear models, Logistic regression, Variable selection, Regularization, L1 norm, Coefficients, Stepwise

Related items

1	Generalized Linear Model With Variable Coefficients Application Of Traffic Data
2	Variable Selection And Sparse Regularization In High-dimensional Models
3	Variable Selection Methods Via The Elastic Net In Generalized Linear Models
4	Exploration And Generalization Of Several Stepwise Variable Selection Algorithms
5	Study On Variable Selection In Functional Linear Regression Models
6	Variable Selection Of Some Regression Models In High Dimensionality
7	Study On The Variable Selection Problems In Dispersion Modeling
8	Model Selection For High-Dimen Sional Quadratic Regression Via Regularization
9	Variable Selection And Application For Linear Regression Models In Complex Data Setting
10	Based On The Glm,Glmm,Pls-logistic,the Current Income Distribution Of Satisfaction In China Is Analyzed