Variable Selection For High-Dimensional Gene Data

Posted on:2015-01-15

Degree:Master

Type:Thesis

Country:China

Candidate:Q Cheng

Full Text:PDF

GTID:2250330428976648

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

With the development of computer intelligence storage, cloud computing and other new technologies, massive high-dimensional data has penetrated into all field of life, such as gene expression, risk control, combinatorial chemistry, experts recommend systems. Variable selection as a core issue in high-dimensional data has been in the spotlight in recent years. Effective variable selection can not only simplify the model but also improve the interpretability and prediction accuracy of the model. This paper focuses on the variable selection with large p small n genetic data in case control study, in which all the predictors are discrete variables.In the high-dimensional data analysis, the usually approach for variable selection is to combine SDR with penalized methods. This paper use the marginal regression to rank the importance of the predictors, the structural dimension is given in a reasonable way. With the method about large p small n problem from Yin, we execute variable selection in some models with all discrete predictors.In the simulation,We discuss some different models according to aggregate degree of relative variables and the independence of predictors. All models can be used by our method, and the model with independent predictors performs better than dependent case. For example, consider p=3000, case-control study, when the sample size of case or control is100, TPR reaches86%, while TPR attains99.8%if the sample size is300.

Keywords/Search Tags:

High-dimensional data, SDR, Large p small n, Variable selection, Marginregression, TPR

PDF Full Text Request

Related items

1	Variable Selection And Feature Screening In High-dimensional Data
2	Variable Selection Methods In Statistical Models For Survival Data
3	The Parameter Estimation And Variable Selection In High Dimensional Collinearity Models
4	SCAD Regression Of High Dimensional With Small Sample Data
5	Robust Estimation And Variable Selection Of Two Kinds Of Semi-parametric Models Under High Dimension Data
6	Estimation And Variable Selection For Sparse High-dimensional Ordinary Differential Equation
7	A Study On The Selection Of High - Dimensional Data Variables
8	Variable Selection Based On PLS And Its Application On High Dimensional Data
9	Bayesian Variable Selection in Parametric and Semiparametric High Dimensional Survival Analysis
10	Model Selection For High-Dimensional Multinomial Logistic Regression Models