| Objective:Diffuse large B-cell lymphoma(DLBCL)is a common aggressive malignancy in the B-cell non-Hodgkin lymphoma.Although the regimen that combines rituximab with standard treatment has improved the overall survival rate of patients,30%-50% cases remain sensitive to chemotherapy or relapse after remission and eventually die.In addition,under the influence of a clinical staging,a tumor subtype,a therapeutic regimen or other factors,the survival rates of patients are different.Many existing studies focus on the population of patients with DLBCL,such as the analysis of prognostic factors,the estimation of overall survival rates,and the evaluation of treatments.There are few clinical prediction studies based on the clinicopathological characteristics of patients,especially the prediction of risk(probability).The accurate risk prediction is critical to achieve precision medicine,which can help clinicians make optimal therapeutic decisions for personalized patients to improve their clinical outcomes and to extend their survival times.Thus,this study aims to develop an accurate risk(probability)prediction model of death for DLBCL by applying probability calibration to ensemble methods,and to provide a reference for doctors’ decision-making and patient treatment.Methods:The data were collected from the electrical medical records,which were ascertained from 406 patients with DLBCL diagnosed in a certain hospital from 2010 to 2017.The predictors were selected by the Cox proportional-hazards regression model,the logistic regression model and the analysis of variables importance of the random forest algorithm.Five common models with good classification ability were used as the base models of ensemble methods,including na(?)ve Bayes(NB),logistic regression(LR),random forest(RF),support vector machine(SVM)and feedforward neural network(FNN)models.In this work,shape-restricted polynomial regression(RPR)with well universality was selected as the probability calibration method.Firstly,the base models were calibrated by using RPR and compared with the results of two classical probability calibration methods,including Platt(Platt scaling)and Iso Reg(isotonic regression).Then,three aggregation strategies(stacking,weighted averaging and simple averaging)were used to combine the base models calibrated by RPR,and to generate the final risk prediction model.Finally,we used the average result of hold-out test repeated 300 times to evaluate the model performance.Model evaluation was based on discrimination and calibration.We used AUC(the area under the ROC curve)to measure discrimination(i.e.,classification ability)of the model and assessed their calibration(i.e.,accuracy of probability estimates)by using the H-L(HosmerLemeshow)test,expected calibration error(ECE)and maximum calibration error(MCE).Results:Gender,stage,IPI(international prognostic index),KPS(Karnofsky performance status)and rituximab were significant factors influencing the death of DLBCL patients within 2years.For the 5 base models,LR(ECE=9.517,MCE=24.400,P=0.226)and FNN(ECE=9.211,MCE=23.500,P=0.329)models were well-calibrated and their calibration errors were not be further improved regardless of which probability calibration methods were used.Probability calibration could significantly reduce the calibration error of NB(ECE=14.206,MCE=38.900,P < 0.001),RF(ECE=13.569,MCE=36.000,P < 0.001)and SVM(ECE=13.225,MCE=32.100,P=0.014)models,whose initial prediction errors were large,and the RPR performed best(NB-RPR: ECE=9.514,MCE=23.800,P=0.257;RF-RPR:ECE=10.070,MCE=26.550,P=0.198;SVM-RPR: ECE=10.893,MCE=26.300,P=0.140).For the ensemble models,the ensemble models which first calibrated the base models(Stacking-EN-C: ECE=8.983,MCE=21.265,P=0.350;ECE-EN-C: ECE=9.027,MCE=22.350,P=0.351;MCE-EN-C: ECE=9.159,MCE=22.300,P=0.345;SA-EN-C:ECE=9.295,MCE=23.300,P=0.314)performed better than the ensemble models without undergoing probability calibration(Stacking-EN: ECE=9.866,MCE=24.850,P=0.225;ECE-EN: ECE=9.228,MCE=24.500,P=0.186;MCE-EN: ECE=9.317,MCE=24.200,P=0.204;SA-EN: ECE=9.695,MCE=26.100,P=0.130),regardless of which aggregation strategies(i.e.,stacking,weighted averaging,and simple averaging)were used.Among the28 models developed in this paper,the stacking model that first calibrated the base models by RPR(Stacking-EN-C)performed best(AUC=0.820,ECE=8.983,MCE=21.265,P=0.350).Conclusion:Given that the base models of ensemble methods may not be able to generate accurate probability estimates in probability forecasting tasks,the probability calibration was applied to ensemble methods in this paper,in order to get better ensemble effect and to generate more accurate probability predictions.The results showed that probability calibration could further reduce the prediction error of ensemble models,compared with the ensemble models without undergoing probability calibration.The developed model based on probability calibration and ensemble methods in this study achieved expected performance,which was critical for doctors’ decision-making and personalized patient treatment.Meanwhile,the modeling strategy proposed in our study may be considered in future work. |