Font Size: a A A

Variable Selection Methods for Longitudinal Data

Posted on:2012-03-19Degree:Ph.DType:Dissertation
University:Harvard UniversityCandidate:Cui, RainFull Text:PDF
GTID:1460390011468824Subject:Biology
Abstract/Summary:
We present two main additions to the work on variable selection methods for longitudinal data. First, we consider variable selection in linear mixed models (LMM) for longitudinal continuous data. We propose a regularized log-likelihood for variable selection of fixed effects. Three penalty functions are considered, the LASSO, adaptive LASSO, and SCAD penalties. We show that the maximized regularized likelihood estimator of the regression coefficients can be equivalently obtained by jointly maximizing the penalized likelihood of the random effects and the fixed effects. We also extend the restricted maximum likelihood (REML) for estimation of variance components to account for variable selection. The performance of the proposed methods are evaluated using simulations; results demonstrate the longitudinal variable selection methods work well. The methods are further illustrated through application to a HIV codon dataset.;In addition, we investigate various variable selection tuning parameter selectors and propose several AIC and BIC type criteria. Simulation results show that criteria performance depends on the covariate signal strength and the size of the dataset. The multiple criteria are also used in the HIV codon dataset analysis; again, results differed dramatically by variable selection method and tuning parameter selector.;Our second main area of work is in variable selection in generalized linear mixed models (GLMM) for longitudinal discrete responses. In particular, we define fixed effects variable selection for the penalized quasi likelihood (PQL) procedure for the GLMM. As the PQL uses LMM model theory for fixed effects parameter estimation, we apply our proposed theories of regularized log-likelihood for variable selection with longitudinal continuous outcomes. We again suggest extending the REML to estimate the variance components. We also examine a few tuning parameter selection criteria for variable selection for the GLMM. Simulations are run to study the methods' performance; they show that the methods work relatively well. In application to the HIV dataset, however, the methods appear to underfit the model. This disparity in performance could be explained by the weak signal strength of the HIV data set.
Keywords/Search Tags:Variable selection, Data, Longitudinal, HIV, Fixed effects, Performance, Work
Related items