Objective:We compared and analyzed the early clinical data of COVID-19 inpatients aged 60 years and above,explored the risk factors associated with COVID-19 readmissions in patients aged 60 years and above,screened important features that could help predict readmissions,and later developed the Nomogram model and Random forest model,respectively,and further validated the model efficacy to provide a rapid and effective objective reference basis for early identification of potential readmissions in inpatients and stratified management of clinical interventions.Methods:The early clinical data(baseline data,clinical first symptoms and imaging manifestations,early laboratory test indexes)of 675 patients aged 60 years and above with COVID-19 admitted to Wuhan Vulcan Hill Hospital from February 4 to March 30,2020 were retrospectively analyzed.The study population was defined into two groups: the mild group(light general type)and the severe group(heavy critical type),and after simple data cleaning,they were randomly divided into training and test sets according to the ratio of 8:2.The clinical data of the two datasets were compared and analyzed,after which all the characteristic variables were analyzed for differences on the training set,and then two models were built separately.Nomogram model: firstly,the characteristic variables with statistically significant differences were analyzed univariately,after that,the characteristic variables with significant univariate analysis were diagnosed with covariance and those with variance inflation factor(VIF)> 5 were excluded,and the remaining characteristic variables were included in the multi-factor binary logistic Regression analysis was performed to obtain independent risk factors associated with COVID-19 severe disease,and then a Nomogram model was constructed using the independent risk factors,and finally a subject work characteristic curve(ROC curve),calibration curve(1000 resampling Bootstrap method)and decision curve analysis(DCA)analysis were used to evaluate the predictive efficacy of the model on Random forest model: First,all the features of the training set are included in the random forest to build a preliminary model to obtain the Variable importance measure(VIM)based on the Mean decrease gini.The model is then optimized and the model is output,and finally the model is validated on both datasets using ROC curves and Confusion matrix.Finally,the model is validated on both datasets using ROC curves and Confusion matrix.Results:(1)Among 540 COVID-19 patients in the training set,there were 324(60%)in the mild group and216(40%)in the severe group,and 86(63.7%)in the mild group and 49(36.3%)in the severe group in the test set of 135 cases.There was no statistically significant difference(p > 0.05)between the clinical information on both the training set(n = 540)and the test set(n = 135).(2)For indicators with statistically significant differences on the training set using univariate and multivariate binary logistic regression analysis yielded combined diabetes [OR=1.838(95% CI=1.167 to2.896)],age [OR=1.054(95% CI=1.026 to 1.083)],α-hydroxybutyrate dehydrogenase(α-HBDH)[OR=1.007(95% CI=1.003 to 1.011)],D-dimer(D-D)[OR=1.265(95% CI=1.074 to 1.490)],lymphocyte percentage(LYM%)[OR=0.955(95% CI=0.932 to 0.980)],total platelet count(PLT)[OR=0.995(95%CI=0.993 to 0.998)] were independent risk factors associated with COVID-19 severe disease(P<0.05).Based on the obtained risk factors,the Nomogram model was constructed,and the AUCs were 0.801(95%CI=0.753-0.831)and 0.793(95% CI=0.725-0.877)on the training and test sets,respectively,with good model discrimination;the diagonal calibration curves obtained by Bootstrap method using 1000 computer resampling on the training and test sets The mean absolute error(MAE)was 0.011 and 0.024,and the model was well calibrated.The DCA decision curves on the training and test sets indicated that the model had a high net clinical benefit when the threshold probability range was 0.1 to 1.0.(3)Feature screening was performed on the training set based on VIM using CELF,and finally LYM%,neutrophil-to-lymphocyte ratio(NLR),D-D,α-HBDH,LYM,CRP,age,lactate dehydrogenase(LDH),absolute neutrophil value(NEU),and total platelet count(PLT)were identified as important features associated with COVID-19 critically ill patients.A random forest model with decision tree number ntree=200,randomly selected 3 variables at each decision point mtry=3,and split minimum sample size of2 was constructed to achieve the optimal variance and bias trade-off.After that,the established models were evaluated and analyzed,and the AUCs on the training and test sets were 0.946 and 0.876;respectively.the accuracy,specificity,precision,recall,and F1 Score on the training and test sets were 0.954 and 0.800,0.981 and 0.607,0.960 and 0.810,0.950 and 0.800,0.950 and 0.790;the model distinguishes well and has high prediction accuracy.Conclusions:(1)This study combined the results of two algorithmic studies to finally identify age,α-HBDH,D-D,LYM%,and PLT as independent risk factors associated with COVID-19 readmission in patients aged 60 years and older,and the two models constructed mainly provide an objective reference basis for the risk of readmission in clinical patients aged 60 years and older with COVID-19.(2)Nomogram model: Based on the principle of binary logistic regression,we developed and validated a risk prediction model for readmission of patients aged 60 years and older with COVID-19 based on easily available clinical data at the time of admission,and internally validated it on the test set with good model discrimination,model calibration and high net clinical benefit.Risk stratification of admitted patients using the model allows early identification of potentially serious cases and provides a rapid and effective objective reference basis for rational allocation of health resources and clinical intervention.(3)Random Forest model: Based on the random forest algorithm using limited small sample data in high dimensions for feature engineering,ten important features were screened out to build a clinically usable classification model,and the confusion matrix was internally validated for its good model differentiation and prediction accuracy.A comparative analysis with the Nomogram model verified that the model has good clinical application value. |