Font Size: a A A

Regularized regression methods for variable selection and estimation

Posted on:2011-10-21Degree:Ph.DType:Dissertation
University:Harvard UniversityCandidate:Dicker, Lee HerbrandsonFull Text:PDF
GTID:1440390002464864Subject:Biology
Abstract/Summary:
We make two contributions to the body of work on the variable selection and estimation problem. First, we propose a new penalized likelihood procedure---the seamless-L0 (SELO) method---which utilizes a continuous penalty function that closely approximates the discontinuous L0 penalty. The SELO penalized likelihood procedure consistently selects the correct variables and is asymptotically normal, provided the number of variables grows slower than the number of observations. The SELO method is efficiently implemented using a coordinate descent algorithm. Tuning parameter selection is crucial to the performance of the SELO procedure. We propose a BIC-like tuning parameter selection method for SELO which consistently identifies the correct model, even if the number of variables diverges. Simulation results show that the SELO procedure with BIC tuning parameter selection performs very well in a variety of settings---outperforming other popular penalized likelihood procedures by a substantial margin. Using SELO, we analyze a publicly available HIV drug resistance and mutation dataset and obtain interpretable results.;Our second contribution is the development of techniques for estimating equation based variable selection. We use the Dantzig selector, a variable selection and estimation procedure based on the normal score equations, as a template. After deriving new asymptotic results for the Dantzig selector, we propose the adaptive Dantzig selector---an extension of the Dantzig selector which consistently selects the correct variables and is asymptotically normal. We show that the adaptive Dantzig selector outperforms the Dantzig selector in various simulated settings. Finally, we show that the Dantzig selector may be extended to handle many different types of data, provided a reasonable estimating equation is available---a full likelihood model for the data is not necessary. Our generalization of the Dantzig selector for estimating equations has good asymptotic properties, which are similar in flavor to those of the adaptive Dantzig selector. As an example, we consider the application of the Dantzig selector to generalized estimating equations (GEEs). We show that the performance of variable selection and estimation procedures may be improved by using GEEs to account for excess correlation which may be present in the data.
Keywords/Search Tags:Variable selection, Dantzig selector, SELO, Procedure
Related items