Font Size: a A A

Prognostic Prediction Of Colorectal Cancer Based On Super Learner

Posted on:2020-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:J Q LiFull Text:PDF
GTID:2404330572484234Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Colorectal cancer is one of the most common malignant tumors endangering human health.The burden of disease caused by colorectal cancer is increasing.In China,the incidence of colorectal cancer ranks third and the mortality rate ranks second among males,while among females,the incidence and mortality rate rank third.Practice has proved that accurate judgment of prognosis and influencing factors of patients with colorectal cancer and timely adjustment of treatment and intervention programs are effective strategies to reduce their mortality and disease burden.However,the current methods of judge clinical prognosis for colorectal cancer are only based on TNM staging(including the depth of pathological invasion,the number of regional lymph node metastases and whether distant metastasis).The accuracy of the assessment is often not high based on the experience of doctors.In order to improve the accuracy of prognostic prediction and increase the objectivity of prognostic judgement,other common prognostic indicators have been added on the basis of TNM staging.A prognostic prediction model of colorectal cancer has been constructed by using conventional single disease prediction models(such as Weibull regression,Cox proportional risk regression model,random survival forest model based on machine learning,etc.).However,each of these single prediction models has its own limitations.In the case of different population and different prediction variables,the prediction effect is often very different,which seriously affects the accuracy of extrapolation prediction.In order to improve the accuracy and generalization ability of prognosis prediction for colorectal cancer,this study aims to combine Cox proportional risk model,random survival forest,additive risk model,Weibull regression model,exponential regression model,lognormal regression model,logarithmic logistic regression model and random conditional reasoning forest under the framework of the newly developed Super learner theory to construct a new generation of prognostic prediction model for colorectal cancer.Firstly,the prediction accuracy and stability of Super learner and traditional single prediction model in different types of data are systematicaly compared and evaluated by theoretical simulation.Then,using six real-world prognostic cohorts from different races and regions,a combination prediction model based on Super learner and single prediction model were established,and the effect of prediction was compared and validated.The results of the study are as follows:1.Statistical simulation results show that Super learner has a better prediction effect in the situation of relatively simple data structure and fewer predictive variables.The mean of C-index is 0.715,while the mean of comprehensive evaluation index | 1-O/E I of calibration ability(O/E)is 0.069,Super learner combined predicting model shows robust calibration ability(O/E)close to 1,while other single predicting models show a unstable calibration ability(O/E)in extrapolation predicting of different structures.2.Among the 6 groups of colorectal cancer prognostic cohorts in different real-world studies,Super learner combination prediction model showed more robust predictive effect and stable extrapolation generalization ability,while other single prediction model showed less robust performance and poor consistency in heterogeneous colorectal cancer prognostic cohorts with different distribution characteristics.The concrete manifestations are as follows:(I)In the first group of cohorts(the prognostic cohort of colorectal cancer constructed by our research group as training set and TCGA-COADREAD as validation set),the order of C-index is lognormal regression model(0.819),logarithmic logistic regression model(0.815),Super learner(0.813),which ranked third;the order of O/E ratio is Cox proportional risk model(1.086),exponential regression model(1.087),respectively.Weibull regression model(1.088),random conditional reasoning forest(1.111),Super learner(1.113),which ranked fifth.(2)In the second group of cohorts(the colon cancer prognosis cohort in the,survival"package as the training set,and the colon cancer prognosis cohort constructed by our group as the validation set),the order of C-index was the additive risk model(0.819),the logarithmic normal regression model(0.730),the logarithmic logistic regression model(0.729),the Weibull regression model(0.727),and the exponential regression model(0,727),Super learner(0.723),ranked sixth;the order of O/E ratio was random conditional reasoning forest(1.213),Weibull regression model(1.216),exponential regression model(1.235),additive risk model(1.252),lognormal regression model(1.269),logarithmic logistic regression model(1.277),Super learner(1.292),ranked seventh.(3)In the third group of cohorts(the prognostic cohort of colorectal cancer constructed by our research group is listed as training set and TCGA-COADREAD as validation set),C-index of Super learner was 0.816,which ranked first;the order of O/E ratio was Weibull regression model(1.053),logarithmic logistic regression model(1.054),exponential regression model(1.054),and logarithmic regression model(0.816),respectively.Random conditional reasoning forest(1.070),lognormal regression model(1.071),Super learner(1.077),ranked sixth.(4)In the fourth group of cohorts(GSE40967 data downloaded from GEO database as training set and GSE41258 data as validation set),the order of C-index was random conditional reasoning forest(0.822),additive risk model(0.820),Super learner(0.818);the order of O/E ratio was random survival forest(0.929),random conditional reasoning forest(0.886),Super learner(0.878),ranked third.(5)In the fifth group of cohorts(GSE40967 data downloaded from GEO database as training set and TCGA-COAD data as validation set),the order of C-index was additive risk model(0.790)and Super learner(0.820);the order of O/E ratio was random conditional reasoning forest(0.981),Cox proportional risk model(0.980),random survival forest(0.979)and additive risk model(0.975),Super learner(0.973),ranked fifth.(6)In the sixth group of cohorts(the colon cancer prognosis cohort constructed by our research group as training set,and the GSE40967 data downloaded from GEO database as validation set),the order of C-index was random conditional reasoning forest(0.733),Super learner(0.725),and the order of O/E ratio was lognormal regression model(0.998),Super learner(0.990),ranked second.(7)In the internal validation,the C-index means of Super learner ranked third in the order of random survival forest(0.929),random conditional reasoning forest(0.800),Super learner(0.795),and the order of the comprehensive evaluation index ?1-O/E| was random conditional reasoning forest(0.041),random survival forest(0.042),additive risk model(0.042),Super learner(0.046),ranked fourth.In external validation,Super learner's C-index average is 0.780,ranking first;the comprehensive evaluation index of | 1-O/E | is random survival forest(0.079),Weibull regression model(0.081),exponential regression model(0.083),additive risk model(0.071),Super learner(0.092),ranking fifth.Conclusion:1.In the simulation scenario where the data structure is relatively simple and the number of predictive variables is relatively small,Super learner has a good prediction effect.2.In more complex real-world queues,the single prediction model is often unstable,and its prediction accuracy fluctuates greatly in different data sets;while Super learner combined prediction model shows a more robust prediction effect in any case,and its extrapolation generalization ability is stable.3.The prognostic prediction model of colorectal cancer based on Super learner combination prediction strategy has the excellent characteristics of strong robustness,high accuracy and strong extrapolation generalization ability,which provides a new method for clinical prognosis prediction of colorectal cancer.
Keywords/Search Tags:colorectal cancer, prognosis prediction, Super learner
PDF Full Text Request
Related items