Font Size: a A A

Stud Ies On Robust Va Riable Selection Methods

Posted on:2014-03-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L FanFull Text:PDF
GTID:1220330434471269Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Variable selection is a basic work and play an important role for statistical modeling. In order to achieve better prediction, a good statistical model should contain only those few covariates which are truly related to the response variable. In statistical modeling process, we also hope that the variable selection method is robust, especially when data contain outlier, robust variable selection methods should can resist the effect of outlier and perform stably. The purpose of this research is to propose a series of robust variable selection methods for longitudinal data or more complex high-dimensional censored data.The main results of this paper and the innovation lies in:First, we introduce robust variable selection in linear regression model for longitudinal data, we have established a penalized robust estimation equation, given efficient algorithms, and prove that under certain conditions, the proposed robust variable selection method has Oracle property (Fan et al.(2001)[26]). In the simulation, we compared the effects of several different penalty functions, and give comparison between robust methods and performance of the non-robust methods when data contain outliers. The proposed methods are also illustrated in the analysis of a progesterone hormone longitudinal data set(download URL: http://www.lancs.ac.uk/diggle/.). The innovation of this method is that it is robust deal with the data set contain outliers, and by incorporate the correlation structure, the proposed methods can improve the efficiency of the estimation and variable selection.Second, we proposed a robust variable selection method jointly for fixed and random effects in linear mixed effects models for longitudinal data. We use the EM algorithm in calculate process. We show the Oracle properties of the proposed method based on adaptive LASSO penalty. In simulation, we disturb the fixed and random effects, as well as the response variable to examine the performance of the robustness of our approach, and the simulation results show that the propose method can prevent various pollution. The proposed methods are illustrated in the analysis of the progesterone hormone Data and CD4data (download URL:http://www.lancs.ac.uk/diggle/.), and the result confirm the theory results. The innovation of this method is that we can select variable jointly for fixed and random effects in addition to the robust properties.Third, we propose a two-stage variable selection for fixed censored quantile regression, in which the dimension of the covariates is ultra-high. We show that the first stage penalized estimator with LASSO penalty reduces the model from ultra-high dimensional to a model that has the same size to the true model and contains the true model as a valid sub model, as long as the censored proba-bility can be estimated consistently. By applying adaptive LASSO penalty in the second step to the reduced model, the second stage excludes the remained irrelevant covariates, leading to an estimator consistent in variable selection and oracle property. The proposed methods is illustrated by the analysis of the actual Boston prices data(download URL:http://lib.stat.cmu.edu/datasets/boston-corrected.txt). The results shows that our method can handle high-dimensional data with fixed censored very well.This dissertation is divided into a total of five parts. The first chapter is introduction, including literature review, research background, research motiva-tion and research contents. The second chapter consider variable selection in robust regression models for longitudinal data, including estimation methods, algorithms, asymptotic properties, and numerical simulation, and real data anal-ysis. Chapter3study robust variable selection methods in linear mixed-effects models for longitudinal data, including estimation methods, algorithms, asymp-totic theory, and numerical simulation and real data analysis. Chapter4focuses on the ultra-high-dimensional variable selection for fixed censored quantile re-gression. The content include algorithm, the theory, numerical simulation and real data analysis. Chapter5summaries the whole article and give discussion for further research.
Keywords/Search Tags:Variable selection, Estimate function, Robust, Robustified likeli-hood, Quantile regression, Oracle property, Ultra-high-dimensional, Numericalsimulation, Real data analysis
PDF Full Text Request
Related items