Font Size: a A A

Variable Selection For High-Dimensional Gene Data

Posted on:2015-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q ChengFull Text:PDF
GTID:2250330428976648Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
With the development of computer intelligence storage, cloud computing and other new technologies, massive high-dimensional data has penetrated into all field of life, such as gene expression, risk control, combinatorial chemistry, experts recommend systems. Variable selection as a core issue in high-dimensional data has been in the spotlight in recent years. Effective variable selection can not only simplify the model but also improve the interpretability and prediction accuracy of the model. This paper focuses on the variable selection with large p small n genetic data in case control study, in which all the predictors are discrete variables.In the high-dimensional data analysis, the usually approach for variable selection is to combine SDR with penalized methods. This paper use the marginal regression to rank the importance of the predictors, the structural dimension is given in a reasonable way. With the method about large p small n problem from Yin, we execute variable selection in some models with all discrete predictors.In the simulation,We discuss some different models according to aggregate degree of relative variables and the independence of predictors. All models can be used by our method, and the model with independent predictors performs better than dependent case. For example, consider p=3000, case-control study, when the sample size of case or control is100, TPR reaches86%, while TPR attains99.8%if the sample size is300.
Keywords/Search Tags:High-dimensional data, SDR, Large p small n, Variable selection, Marginregression, TPR
PDF Full Text Request
Related items