Font Size: a A A

Variable Selection in High-Dimensional Setup: A Detailed Illustration Through Marketing and MRI Dat

Posted on:2018-03-13Degree:Ph.DType:Thesis
University:Michigan State UniversityCandidate:Majumder, AtreyeeFull Text:PDF
GTID:2470390020455967Subject:Statistics
Abstract/Summary:
In the times of big data and ever growing information, variable selection is an integral part of statistical analysis. With the advancement of technology, we are able to store and access large volumes of data, only part of which is required for inference. Variable selection is a statistical technique that helps us retain valuable information while discarding everything that is non-significant.;To understand variable selection, we perform a comparative study of various popular frequentist variable selection techniques. This study analyses the difference of performance of models based on Ridge, LASSO and Elastic Net methods of penalized regression. The comparison of these methods is done for both continuous and binary outcome. We further emphasize the importance of tuning parameter selection in penalized regression models. This is done by comparing 6 different methods of tuning parameter selection for each penalized approach. The best performing method is then chosen to build statistical models for market research data of 4 varied countries. This exercise is an application of variable selection. Here, we showcase the applicability of such models in handling large information efficiently, for managerial decisions. We show how managers can leverage this technique for better resource allocation in their business decisions.;Next, we build a model for variable selection in a Bayesian setup. This is motivated by the fact that the frequentist approaches have unstable inference. Here, we analyze Alzheimer's Disease Neuroimaging Initiative (ADNI) with a Bayesian model. This is done by building a Bayesian hierarchical model with multivariate Laplace priors in spike and slab prior style. This model is able to select a group of related variables. The frequentist counterpart of this estimator, group lasso, is also discussed. We build a classification model that is able to select the significant brain regions in Alzheimer's disease with 80% accuracy. Instead of using standard MAP thresholding, we use posterior median thresholding for variable selection. Furthermore, the consistency of this estimator is also proved.;Lastly, we build a Bayesian structured model for variable selection based on magnetic resonance imaging (MRI) data. This model is an extension of the second method but takes into account bi-level selection and spatio-temporal correlation. Voxels in brain regions have spatial correlation and repeated measurements for each voxel which brings in temporal correlation. This model is applied on a simulated functional MRI (fMRI) type data and real data. The real data detects blood oxygenation level dependent (BOLD) activation. The data is large on the account of numerous voxels present in the brain. Our method, successfully, detects the activated brain regions in the presence of a stimuli.;Thus, this thesis delves into various scenarios of variable selection with three different real data application studies. The focus is mainly on Bayesian variable selection and the use of hierarchical modeling with iterative sampling from posterior distribution in the group lasso setup. Our application of using group lasso structure to identify brain regions and voxels is an innovative approach in the context of present literature review. All of these methods have practical implication that can be used to solve relevant real world problems.
Keywords/Search Tags:Variable selection, MRI, Data, Brain regions, Setup, Model, Real, LASSO
Related items