Risk Prediction Of Death In Patients With DLBCL Based On Probability Calibration And Ensemble Methods

Posted on:2022-04-13

Degree:Master

Type:Thesis

Country:China

Candidate:S L Fan

Full Text:PDF

GTID:2504306518475394

Subject:Epidemiology and Health Statistics

Abstract/Summary:

PDF Full Text Request

Objective:Diffuse large B-cell lymphoma(DLBCL)is a common aggressive malignancy in the B-cell non-Hodgkin lymphoma.Although the regimen that combines rituximab with standard treatment has improved the overall survival rate of patients,30%-50% cases remain sensitive to chemotherapy or relapse after remission and eventually die.In addition,under the influence of a clinical staging,a tumor subtype,a therapeutic regimen or other factors,the survival rates of patients are different.Many existing studies focus on the population of patients with DLBCL,such as the analysis of prognostic factors,the estimation of overall survival rates,and the evaluation of treatments.There are few clinical prediction studies based on the clinicopathological characteristics of patients,especially the prediction of risk(probability).The accurate risk prediction is critical to achieve precision medicine,which can help clinicians make optimal therapeutic decisions for personalized patients to improve their clinical outcomes and to extend their survival times.Thus,this study aims to develop an accurate risk(probability)prediction model of death for DLBCL by applying probability calibration to ensemble methods,and to provide a reference for doctors’ decision-making and patient treatment.Methods:The data were collected from the electrical medical records,which were ascertained from 406 patients with DLBCL diagnosed in a certain hospital from 2010 to 2017.The predictors were selected by the Cox proportional-hazards regression model,the logistic regression model and the analysis of variables importance of the random forest algorithm.Five common models with good classification ability were used as the base models of ensemble methods,including na(?)ve Bayes(NB),logistic regression(LR),random forest(RF),support vector machine(SVM)and feedforward neural network(FNN)models.In this work,shape-restricted polynomial regression(RPR)with well universality was selected as the probability calibration method.Firstly,the base models were calibrated by using RPR and compared with the results of two classical probability calibration methods,including Platt(Platt scaling)and Iso Reg(isotonic regression).Then,three aggregation strategies(stacking,weighted averaging and simple averaging)were used to combine the base models calibrated by RPR,and to generate the final risk prediction model.Finally,we used the average result of hold-out test repeated 300 times to evaluate the model performance.Model evaluation was based on discrimination and calibration.We used AUC(the area under the ROC curve)to measure discrimination(i.e.,classification ability)of the model and assessed their calibration(i.e.,accuracy of probability estimates)by using the H-L(HosmerLemeshow)test,expected calibration error(ECE)and maximum calibration error(MCE).Results:Gender,stage,IPI(international prognostic index),KPS(Karnofsky performance status)and rituximab were significant factors influencing the death of DLBCL patients within 2years.For the 5 base models,LR(ECE=9.517,MCE=24.400,P=0.226)and FNN(ECE=9.211,MCE=23.500,P=0.329)models were well-calibrated and their calibration errors were not be further improved regardless of which probability calibration methods were used.Probability calibration could significantly reduce the calibration error of NB(ECE=14.206,MCE=38.900,P < 0.001),RF(ECE=13.569,MCE=36.000,P < 0.001)and SVM(ECE=13.225,MCE=32.100,P=0.014)models,whose initial prediction errors were large,and the RPR performed best(NB-RPR: ECE=9.514,MCE=23.800,P=0.257;RF-RPR:ECE=10.070,MCE=26.550,P=0.198;SVM-RPR: ECE=10.893,MCE=26.300,P=0.140).For the ensemble models,the ensemble models which first calibrated the base models(Stacking-EN-C: ECE=8.983,MCE=21.265,P=0.350;ECE-EN-C: ECE=9.027,MCE=22.350,P=0.351;MCE-EN-C: ECE=9.159,MCE=22.300,P=0.345;SA-EN-C:ECE=9.295,MCE=23.300,P=0.314)performed better than the ensemble models without undergoing probability calibration(Stacking-EN: ECE=9.866,MCE=24.850,P=0.225;ECE-EN: ECE=9.228,MCE=24.500,P=0.186;MCE-EN: ECE=9.317,MCE=24.200,P=0.204;SA-EN: ECE=9.695,MCE=26.100,P=0.130),regardless of which aggregation strategies(i.e.,stacking,weighted averaging,and simple averaging)were used.Among the28 models developed in this paper,the stacking model that first calibrated the base models by RPR(Stacking-EN-C)performed best(AUC=0.820,ECE=8.983,MCE=21.265,P=0.350).Conclusion:Given that the base models of ensemble methods may not be able to generate accurate probability estimates in probability forecasting tasks,the probability calibration was applied to ensemble methods in this paper,in order to get better ensemble effect and to generate more accurate probability predictions.The results showed that probability calibration could further reduce the prediction error of ensemble models,compared with the ensemble models without undergoing probability calibration.The developed model based on probability calibration and ensemble methods in this study achieved expected performance,which was critical for doctors’ decision-making and personalized patient treatment.Meanwhile,the modeling strategy proposed in our study may be considered in future work.

Keywords/Search Tags:

DLBCL, probability calibration, ensemble methods, discrimination, calibration

PDF Full Text Request

Related items

1	Methods For The Calibration And Measurements Of Thoron Concentrations
2	Applications Of Second-order Tensor Calibration Methods In Chinese Herb Component Analysis And Research On Third-order Tensor Calibration Method
3	Analysis Of Drugs In Human Body Fluids Samples Using Three Dimensional Fluorescence Coupled With Multi- Way Calibration Methods
4	Classifying Recurrence Rate For Patients With DLBCL Using Imbalanced Data And Machine Learning Methods
5	Research On Online Geometric Calibration For Cone Beam CT System
6	Second-order Calibration Methods Applied To Determination Of Antitumor Drugs In Complex Systems
7	Using Second-order Tensorial Calibration Methods For Pharmaceutical Analysis In Traditional Chinese Medicine And Human Body Fluids
8	The Calibration Methods Of Near-infrared Vein Display System
9	Multivariate Calibration Methods For Quantitative Analysis Of Antineoplastic Drugs And Biological Macromolecules In Complex Matrices
10	The Applications Of Multi-way Calibration Methodologies To Quantitative Analysis Of Antihypertensives And Local Anesthesia In Complex Matirx System