Font Size: a A A

Application Of Random Survival Forest In Prognosis Prediction Of Lung Cancer Patients With Different Dimensions

Posted on:2022-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:M LiFull Text:PDF
GTID:2504306518978989Subject:Public Health
Abstract/Summary:PDF Full Text Request
Objective:The traditional and commonly used survival prediction model is Cox regression,but it is limited by the proportional hazard assumption and is not suitable for analyzing highdimensional data where the number of variables is much larger than the number of cases.This article constructs a risk prediction model based on the survival data of lung cancer patients in public databases on the prognosis of patients,and will explore the pros and cons of the models with different division rules of RSF in low-dimensional clinical data and highdimensional clinical and gene expression data.The prediction model with rich clinical survival data provides statistical support for more accurate prognosis prediction and personalized prognostic treatment for different clinical patients.Methods:Introduce the basic principles of RSF model construction under different split rules,and conduct research on different dimensions of data.The low-dimensional data comes from the respiratory department of a tertiary hospital in Shanxi Province.The follow-up cohort consists of lung cancer patients diagnosed for the first time and hospitalized.342 patients,12 predictors,survival time and survival outcome.The high-dimensional data is the survival data of lung cancer patients downloaded from the public database TCGA,a total of 422 cases,including clinical data of 10 variables,330 gene expression,as well as survival time and survival outcome;respectively for low-dimensional and high-dimensional data,based on the survival of patients Outcome and survival time are used as response variables,construct the prognostic prediction model Cox(or Lasso-Cox)and RSF of lung cancer patients,as well as the RSF of the largest selection rank statistics,discuss the important influencing factors screened by different models,and compare the prediction effects of different models.Results:The results of different models on low-dimensional survival data show that: Cox screened out 7 predictor variables,RSF split rules all screened out 5 predictor variables,the variables screened out by the Cox model are: age,stage,degree,size,lni,reat and recuci.The variables selected by RSF through the models of various split rules in this article are treat,stage,size,degree,and lni,but the importance of the variables is slightly different.Comparison of prediction capabilities of different models: On the training set,the i AUC of the Cox model is 0.742,which is the lowest compared to models with various split rules of RSF.Among the various models of RSF splitting rules,the log-rank splitting rule and MSRRF have the highest i AUC of 0.997;the log-rank splitting rule on the C index has the highest log-rank splitting rule of 0.711;the MSR-RF has the lowest IBS score of 0.116(p<0.001).On the test set,the i AUC of MSR-RF ranks second only to the log-rank split rule,and its IBS is the lowest 0.141(p<0.001).Results on high-dimensional survival data: The final screened variables of the LassoCox model were 19,and the variables screened out by the four types of RSF models were all greater than 30.Comparison of model prediction capabilities: Compared with the four types of RSF models,the performance of the Lasso-Cox model differs greatly between the training set and the test set.Among the four types of RSF models,the best performance is the log-rank splitting rule.Its i AUC and IBS performance are better than the other three types of models.The worst is log-rank score,with the highest IBS of 0.375(p<0.05)..The prediction performance of the MSR-RF model improved based on the RSF model introduced in this study is average.Conclusion:This study compares the prediction effects of different prognostic prediction models in low-dimensional data and high-dimensional data,and introduces an improved MSR-RF model based on RSF to establish a prediction model for the prognosis of lung cancer patients.In different dimensions,the Cox model(or Lasso-Cox)After comparing with RSF built-in3 types of models and MSR-RF,a total of 6 types of models show that RSF can identify complex interactions between variables,and the model’s prediction discrimination and prediction accuracy are both good.MSR-RF performs well in low-dimensional data,but its advantages are not reflected in high-dimensional data.The study uses MSR-RF in the survival prediction of lung cancer patients.It can more accurately predict the outcome and identify the influencing factors in the lower-dimensional clinical data.However,when the data dimensionality is high,RSF’s log-rank split rule prediction performance Compared with other models,the performance is better.
Keywords/Search Tags:maximum select rank statistics, RSF, Lasso-Cox, lung cancer, prediction
PDF Full Text Request
Related items