| ObjectiveThis study intends to use a retrospective study to explore the relationship between clinical,pathology,immunosuppressive therapy,TCM syndrome and treatment and prognosis of focal crescentic IgA nephropathy(Immunoglobulin A nephropathy,IgAN),and to explore how to select a better prognostic model from a variety of prognostic models established by different machine learning algorithms,so as to help clinical early predict the prognosis of the disease and provide a basis for clinical individualized treatment.Therefore,our aim is to develop a predictive model based on machine learning(ML)to predict the prognosis of IgAN with focal crescent formation and no obvious chronic lesions(glomerulosclerosis<25%).MethodsA retrospective study of IgAN patients confirmed by biopsies in Guangdong Provincial Hospital of traditional Chinese Medicine and Shaanxi Provincial Hospital of traditional Chinese Medicine from 2005 to 2017.The outcome of the study was defined as a compound end point event,with eGFR decreasing≥ 50%,eGFR decreasing≥ 15%in one year,eGFR decreasing ≥ 30%in two years,serum creatinine doubling,progression to ESRD(eGFR<15ml/min/1.73m2),and death.The distribution of clinical,pathological,TCM syndrome and treatment characteristics of IgA nephropathy patients with and without compound end point events were compared.The method of random forest important feature selection was used to explore the important characteristics of 81 characteristic variables collected,and the predictors of characteristic variables closely related to the prognosis of focal crescentic IgA nephropathy were established.Three machine learning algorithms(support vector machine,random forest,naive Bayes)were used to establish the prediction model.The accuracy,recall rate,F1 value,accuracy,the area under the subject operating characteristic curve(AUROC),the area under the precision recall curve(AUPRC)and calibration curve were used to evaluate the prediction performance through triple cross verification(two training sets and one verification set),and the model with better effect was selected.ResultsA total of 374 patients with IgA nephropathy with focal crescentic formation without obvious chronic lesions were included in this study.A total of 66(17.6%)patients with IgA nephropathy reached a compound end event after a follow-up of 32.99 months.In this study,81 epidemiological,clinical,pathological and TCM elements of the patients were collected.1.General data:in terms of gender,there were 200 women(53.5%).Among the two groups of patients grouped according to compound endpoint events,the proportion of women in the endpoint group was significantly higher than that in the non-endpoint group(66.7%VS50.6%,P=0.018),with statistical difference.In terms of inducing factors,79 patients(21.1%)had induced factors,of which 46 cases(12.3%)had prophase respiratory tract infection,followed by 15 cases(4%)of fatigue.The proportion of fatigue inducing factors in the endpoint group was significantly higher than that in the non-endpoint group(9.1%VS2.9%,P=0.02),with statistical difference.In terms of initial symptoms,155 cases(41.4%)were abnormal in asymptomatic examination,followed by edema in 57 cases(15.2%),gross hematuria in 57 cases(15.2%),and increased foam urine in 50 cases(13.4%).The proportion of the initial symptoms of edema in the endpoint group was significantly higher than that in the non-endpoint group(25.8%VS 13%,P=0.009),with statistical difference.2.Laboratory data:In terms of urine examination,the amount of urinary protein in 374 patients before renal biopsy was 0.88(0.43-1.57)g/d.Compared with the non-endpoint group,the urinary protein quantity of 1.19(0.48-2.32)g in the endpoint group was higher than that of 0.81(0.43-1.49)g in the non-endpoint group,P=0.034,with statistical difference.In terms of serum protein,serum total protein 67(62-71.53)g/L and serum albumin 40.9(37.18-43.9)g/L in 374 patients,The serum total protein(P=0.031)and serum albumin(P=0.005)in the endpoint group were lower than those in the non-endpoint group,with statistical difference.In terms of electrolytes,the serum calcium 2.21(2.11-2.29)mmol/L in the endpoint group was significantly lower than that in the non-endpoint group 2.24(2.17-2.33)mmol/L,P=0.034,with statistical difference.3.Pathological data:In terms of MEST-C Oxford classification,there was a significant difference in large crescent between the two groups(P=0.041).The number of large crescent in the end-point group was 12 cases(18.2%),which was higher than that in the non-endpoint group 41 cases(13.3%).4.Western medicine treatment data:Most patients were treated with renin-angiotensin-aldosterone system inhibitors in 247 cases(66%),including angiotensin-converting enzyme inhibitors in 33 cases(8.8%)and angiotensin Ⅱ receptor antagonists in 217 cases(58%).Followed by 130 cases of glucocorticoids(34.8%)and 140 cases of immunosuppressants(37.4%).The use of glucocorticoids in 31 patients(47%)in the end-point group was higher than that in 99 patients(32.1%)in the non-end-point group,P=0.022,with statistical difference.Immunosuppressive therapy was used in 32 patients(48.5%)in the end-point group which was higher than 108 patients(35.1%)in the non-end-point group,P=0.041,with statistical difference.5.Essential data of Traditional Chinese Medicine:In terms of TCM symptoms,34 TCM symptoms were collected in this study,including 308 cases(82.4%)of fatigue,157 cases(42.0%)of lumbar spine pain,125 cases(33.4%)of dry mouth,55 cases(14.7%)of bitter mouth,55 cases(14.7%)of edema,46 cases(12.3%)of nocturia were the six TCM symptoms with the highest proportion and the most common in IgA nephropathy.4 cases(6.1%)in the end-point group were afraid of cold limbs were higher than 4 cases(1.3%)in the non-end-point group,P=0.036,with statistical difference.Edema occurred in 17 patients(25.8%)in the end-point group,which was higher than that in 38 patients(12.3%)in the non-end-point group,P=0.005,with statistical difference.In terms of tongue coating,thin and white tongue coating was found in 114 cases(30.5%),white and greasy coating in 83 cases(22.2%),and yellowish and slightly greasy tongue coating in 65 cases(17.4%),these three kinds of tongue coating are the most common.The occurrence of white and greasy coating in 6 cases(9.1%)in the end-point group was lower than that in the non-end-point group in 77 cases(25%),P=0.005,with statistical difference.In terms of TCM syndrome differentiation,the top three cases of this syndrome were spleen and kidney qi deficiency in 223 cases(59.6%),qi and yin deficiency in 107 cases(28.6%),spleen and kidney yang deficiency in 19 cases(5.1%).The spleen-kidney qi deficiency syndrome of 31 cases(47%)in the end-point group was lower than 192 cases(62.3%)in the non-end-point group,P=0.021,with statistical difference.The spleen and kidney yang deficiency syndrome of 8 cases(12.1%)in the end-point group was higher than that of 11 cases(3.6%)in the non-end-point group,P=0.009,with statistical difference.In the aspect of concurrent syndrome,phlegm-dampness syndrome occurred in 6 cases(9.1%)in the end-point group,which was lower than that in 58 cases(18.8%)in the non-end-point group,P=0.057,showing a statistical difference trend.In terms of TCM treatment,340 cases(90.9%)were treated with TCM decoction,including 281 cases(75.1%)of invigorating spleen and tonifying kidney drugs,173 cases(46.3%)of promoting blood circulation and removing blood stasis drugs,80 cases(21.4%)of heat-clearing and blood-cooling drugs,and 49 cases(13.1%)of blood-cooling hemostatic drugs.Haikun Shenxi capsule was used in 4 patients(6.1%)in the end-point group,which was higher than that in 4 patients(1.3%)in the non-end-point group,P=0.036,with statistical difference.There were 20 cases(30.3%)in the end-point group who used heat-clearing and blood-cooling drugs,which was higher than 60 cases(19.5%)in the non-end-point group,P=0.052,showing a statistical difference trend.Random forest was used to screen out the important feature indexes for predicting prognosis from 81 feature indexes collected,among which the top five important indexes were baseline eGFR,serum albumin,urine protein quantification,serum triglyceride,urine red blood cell count.According to the ranking of important features and medical selection,11 important features including baseline eGFR,serum albumin,urinary protein quantification,serum triglyceride,urinary red blood cell count,serum creatinine,age,serum uric acid,proportion of crescent,large crescent and spleen and kidney yang deficiency were selected.The important features were incorporated into the construction of machine learning prediction model.Among the prediction models established by three widely used machine learning algorithms(support vector machine,random forest and naive bayes),the support vector machine model shows higher accuracy,recall rate,F1 value,accuracy,area under the subject operating characteristic curve(AUROC),area under the precision recall curve(AUPRC)and LIFT.In the IgAN prognostic model with crescents and spleen-kidney yang deficiency predictors,the AUROC of support vector machine model,random forest model and naive bayes model were 0.8947,0.7570 and 0.6711 respectively,and the AUPRC were 0.805,0.547 and 0.434,respectively,.In the IgAN prognostic model without crescent predictors,the AUROC of support vector machine model,random forest model and naive Bayesian model were 0.8351,0.7816 and 0.7070 respectively,and the AUPRC were 0.682,0.531 and 0.492,respectively.In the IgAN prognostic model without spleen and kidney yang deficiency,the AUROC of support vector machine model,random forest model and naive bayes model were 0.8947,0.7237 and 0.695 respectively,and the AUPRC were 0.805,0.466 and 0.469,respectively.ConclusionBased on the important characteristics of random forest,11 important characteristics of eGFR,serum albumin,urinary protein,serum triglyceride,urinary red blood cell count,serum creatinine,age,serum uric acid,proportion of crescent,large crescent,spleen and kidney yang deficiency had important influence on the prognosis of the model.Three machine learning algorithms(support vector machine,random forest,naive Bayesian)show good predictive performance for patients with focal crescent IgAN without obvious chronicity.Among the three algorithms,the support vector machine algorithm shows the best prediction performance.In addition,the predictive efficacy of IgAN prognostic model with crescent and spleen-kidney yang deficiency predictor was better than that of IgAN model without crescent predictor and IgAN prognostic model without spleen-kidney yang deficiency. |