| Objective:To establish a non-invasive diagnostic prediction model for immunoglobulin A nephropathy(Ig AN)and a prognostic prediction model for Ig AN using clinical indicators,to analyze and screen the optimal prediction model,to discover sensitive indicators for Ig AN diagnosis and prognosis,and to provide a basis for the non-invasive diagnosis and prognostic judgment of clinical Ig AN.Methods:(1)Using the historical data analysis method in descriptive epidemiology,case data of patients with kidney disease confirmed by renal puncture biopsy in a municipal tertiary hospital in Gansu from February 2018 to June 2022 were collected for analysis.(2)The prevalence of renal disease in the region was analyzed based on the pathological findings of renal biopsies,and the general profile of Ig AN and non-Ig AN patients was described and compared.Diagnostic predictor variables were screened by clinician recommendations,intergroup comparisons and correlation analysis,and noninvasive diagnostic prediction models were developed using Lasso-logistic regression,artificial neural network(BPNN)and extreme gradient boosting(XGBoost).Meanwhile,ROC curves,confusion matrices and calibration curves were used to evaluate the efficacy of the diagnostic prediction models.Finally,the ROC curves were used to select sensitive indicators for Ig AN diagnosis.(3)A follow-up study was conducted using a follow-up study method for patients diagnosed with Ig AN among the above kidney disease patients until October 31,2022.The incidence density of prognostic adverse outcomes in patients with Ig AN was calculated based on baseline and follow-up data.Prognostic predictor variables were screened by clinician recommendations,inter-group comparisons and correlation analyses,whereby three prognostic prediction models were developed using Cox proportional risk regression models,BPNN and XGBoost,and the efficacy of the prognostic prediction models was evaluated,and sensitive indicators of Ig AN prognosis were selected.Results:(1)A total of 603 case data were collected from patients with kidney disease,and 514 patients were included in the diagnostic prediction study,which mainly showed 12 kidney disease types,with higher detection rates of Ig AN,membranous nephropathy,glomerulosclerosis,diabetic nephropathy and microscopic lesion nephropathy,which were 25.88%,23.15%,17.70%,16.93%and 8.37%,respectively;among which primary glomerular diseases were 394 cases,and the percentage of Ig AN was 33.76%.The median age of the study population was 49.00(35.25,58.00)years and the male to female ratio was 2.13:1.Statistical differences existed between Ig AN and non-Ig AN patients in terms of general demographic characteristics(gender,occupation,edema,diabetes mellitus and hypoproteinemia),urine biochemical tests and blood biochemical tests.(2)In the Ig AN diagnostic prediction study,45 variables were finally included for the construction of diagnostic prediction models.The data were randomly divided into training and test sets in the ratio of 7:3,and the models were constructed and internally validated in the training set and applied to the test set for external validation.The results showed that the AUC(95%CI)of the Lasso-logistic,BPNN and XGBoost models in the training set were 0.894(0.857-0.924),0.888(0.851-0.919)and 0.999(0.989-1.000),respectively;in the test set their AUCs were 0.831(0.763-0.886),0.699(0.620-0.770),and 0.853(0.787-0.904)in the test set.In the training set,XGBoost has the highest accuracy,precision,recall and F1 score with the best performance,BPNN is the second and Lasso-logistic is the worst;the ranking of accuracy,precision,recall and F1score of each model in the test set is consistent with the training set.Plotting the Calibration calibration curve,the XGBoost model is more stable and performs better in both the training and test sets,while the BPNN model performs worse in the training set and the Lasso-logistic model performs worse in the test set.(3)The main variables in the Lasso-logistic regression model were diabetes,protein(PRO)1+,PRO2+,occult blood(BLD)3+,urinary red blood cells(URBC),red blood cells,granular tubularity,and immunoglobulin A(Ig A);The top 5 variables of characteristic importance in the BPNN model were total 24-hour urine protein(24HUTP),total protein(TP),BLD3+,albumin(Alb),and direct bilirubin(D-Bil);The top 5 variables in the XGBoost model in terms of feature importance were 24HUTP,cardiac troponin I(CTn I),URBC,red blood cell distribution width-SD(RDW-SD),and Alb.Among them,the diagnostic values of 24HUTP,Alb and TP were greater,with AUC(95%CI)of 0.720(00.679-0.759),0.716(0.675-0.754)and 0.713(0.672-0.752),sensitivity of 74.44%,80.45%and 68.42%,and specificity were 62.73%,56.69%and 68.50%,respectively.(4)A total of 133 patients diagnosed with Ig AN were followed up,and a total of104 were included in the prognostic study.The median follow-up time was 14.00(6.75,26.25)months,with eight individuals presenting with adverse prognostic outcomes and an incidence density of 54.83/1000 person-years.(5)In the Ig AN prognostic study,17 variables were finally included for the construction of prognostic models.The models were constructed and compared for efficacy in the overall,and the results showed that the AUC(95%CI)of the Cox,BPNN and XGBoost models were 0.852(0.760-0.919),0.921(0.843-0.968)and 0.949(0.880-0.984),respectively;BPNN had the highest F1 score,XGBoost the second and COX was the worst;plotting the Calibration calibration curve,XGBoost was the best fit,BPNN was the second best,and COX was a poor fit.(6)The main variables of the Cox regression model were mild proteinuria,N-acyl-β-D-aminoglucosidase(NAG),mean arterial pressure(MAP),urea/creatinine(Urea/Crea)and uric acid(UA);The top 5 variables in the BPNN model for characteristic importance were Urea/Crea,NAG,complement C4(C4),PA and mild proteinuria;The top 5 variables in the XGBoost model in terms of feature importance were C4,mild proteinuria,NAG,Urea/Crea,and fibrin(pro)degradation product(FDP).Among them,mild proteinuria,NAG,Urea/Crea,C4,FDP,PA and MAP had greater predictive value in prognostic studies,with AUC(95%CI)of 0.849(0.765-0.912),0.820(0.733-0.889),0.754(0.660-0.833),0.723(0.626-0.806),0.715(0.619-0.800),0.711(0.614-0.796),and 0.710(0.613-0.795),with sensitivities of 87.50%,100.00%,87.50%,75.00%,62.50%,62.50%,and 87.50%,respectively,and specificities were 82.29%,73.96%,70.83%,79.17%,73.96%,76.04%,and 65.62%,respectively.Conclusions:(1)In this study,the detection rate of Ig AN was 25.88%,accounting for 33.76%of primary glomerular disease,and the incidence density of Ig AN patients with adverse prognostic outcomes at a median follow-up of 14 months was 54.83/1000person-years.(2)The XGBoost model performed best in both diagnostic prediction studies and prognostic studies of Ig AN,and was the optimal diagnostic and prognostic prediction model.the Lasso-logistic regression model had better efficacy in diagnostic prediction,and the BPNN model had better efficacy in prognostic prediction.(3)24HUTP,Alb and TP were more sensitive in the clinical diagnosis of Ig AN,and mild proteinuria,NAG,Urea/Crea,C4,FDP,PA and MAP were more sensitive in the prognostic judgment of IgAN. |