Font Size: a A A

(Ⅰ)SIS-based Variable Selection Method And The Application In Survival Analysis Of Ultra High-dimensional Data

Posted on:2014-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:X X ZhangFull Text:PDF
GTID:2254330398962098Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective The ultha high-dimensional genes or protein data using technoloty of DNA microarray and protein spectrum to predicting cancer patients would no longer used the Cox proportion hazard model. This paper explored the advantages and disadvantages of L1Penalized Cox model、Elastic net technology、SIS-Cox model and ISIS-Cox model through the simulation research and analyzing David Beer’s (2002) lung adenocarcinoma research data,to reveal the relationship between the time of death or other ends occurrence and the biological data, and to get more accurate diagnosis and prognosis to improve the therapeutic.Methods The basic principle of L1Penalized Cox model、Elastic net technology、 SIS-Cox model and ISIS-Cox model was introduced. Simulating the characteristics of ultra high dimension, the strong correlation,and small samples,, and reviewing the model selection variables performance.The simulation study for the characteristics of ultra high dimension, strong correlation and small samples, was carried out, to investigated the performance of these models for screening variables. David Beer (2002) scholar lung adenocarcinoma research data set was analyzed.Results The evaluation standard of model estimation was‖β-β‖1,‖β-β‖22,P and MMS.‖β-β‖1,‖β-β‖22indicated the model estimation error,‖β-β‖1=∑j=1p|βJ-βj|,‖β-β‖22=∑j=1p|βj-βj|2,P indicated the proportion of the100repetitions that included all of the important variables in the model. MMS indicated the model size of the final model among100repetitions.From simulation we concluded that L1Penalized Cox model、EN-Cox model、SIS-Cox model and ISIS-Cox model was able to identify all the important variables, when independent variables were mutually independent, but the model size of L1Penalized Cox model and EN-Cox model is several times and even more as large as that of ISIS-Cox model. The model size of (Ⅰ)SIS-Cox model was closest to that of the real number of variables and estimation accuracy was very well. When the variables had serial correlation, SIS-Cox model was performed poor, ISIS-Cox model to identify important variables was better than the other model,.When the variables had complicated serial correlation, for the ability to identify the correct model, ISIS-Cox was still the best performance. In ultra high-dimensional variable screening, example analysis showed that ISIS-Cox model more reliable than the other three models.Conclusion L1Penalized Cox model and SIS-Cox model had a poor performance in survival analysis of ultra high-dimensional data. Although EN-Cox model able to handle collinearity problem,but the dimension would not be reduced to an acceptable level and Estimation accuracy was poor. The ability to correctly identify the model of ISIS-Cox model was very well,the estimation accuracy was high, and it had a better model interpretability than other models, therefore, it was the ideal model of small sample and ultra high-dimensional survival data.
Keywords/Search Tags:ultra high-dimensional biological data, survival analysis, L1punishmentCox model, EN-Cox model, SIS-Cox model, ISIS-Cox model
PDF Full Text Request
Related items