| Objective:This study constructs a model to predict the risk of interstitial lung disease complicating rheumatoid arthritis based on clinical history data of patients with rheumatoid arthritis in order to reduce the incidence and mortality of interstitial lung disease complicating rheumatoid arthritis.Methods:The clinical records of 712 patients with rheumatoid arthritis who attended the Department of Rheumatology and Immunology at the Second Hospital of Shanxi Medical University between December 2019 and October 2022 were collected retrospectively,including information on patients’ general information,laboratory tests and past medication history T test,rank sum test and chi-square test were used to screen the variables related to concurrent interstitial lung disease.The data set was divided into training set and test set according to 3:1.The variables screened by single factor analysis were used as explanatory variables,and the outcome variable was whether there was concurrent interstitial lung disease.Finally,the performance of the model was comprehensively evaluated according to the accuracy,precision,recall,F1 value and AUC value of each model in the test set.Results:By screening,The final variables used for modeling were gender,smoking history,drinking history,rheumatoid factor,antinuclear factor,anti-keratin antibody,methotrexate history,age,number of painful joints,white blood cell,erythrocyte sedimentation rate,C-reactive protein,lactate dehydrogenase,serum total protein,serum albumin,anti-cyclic citrulline antibody,interleukin-2,interleukin-4,interleukin-10,and interleukin-negative 17,tumor necrosis factor-α,interferon-γ.For the performance evaluation of the model on the test set,the accuracy,precision,recall,F1 and AUC of the Logistic regression model were 85.31%,78.38%,61.70%,69.05% and 83.30%,respectively.The decision tree model was adjusted in the training set data set,and the best parameter selection was as follows: gini index was used as the feature selection attribute,and the complexity parameter cp value was 0.0165 after pruning.The accuracy of the decision tree model in the test set was 83.05%,the precision was 68.89%,the recall was 65.96%,the F1 value was 67.39%,and the AUC value was 76.20%.In the random forest model,the tree per tree(ntree)value is 500 and the best number of random features(mtry)value is 6.In the test dataset,the accuracy of the random forest model was 85.88%,the precision was 77.50%,the recall was 65.96%,the F1 value was 71.26%,and the AUC value was 91.30%.In the SVM model,the important parameters were set as gamma: 1E-05 and cost: 1e+05.The accuracy of SVM in the test set was 80.79%,the precision was 62.75%,the recall was 68.09%,the F1 value was 65.31%,and the AUC value was 90.80%.Conclusion:After analysis,random forest model is better than decision tree,support vector machine and traditional Logistic regression model in predicting rheumatoid arthritis complicated with interstitial lung disease.After comprehensive evaluation,random forest model has better prediction performance in rheumatoid arthritis complicated with interstitial lung disease,which can provide help for clinicians to make decisions.In addition,the random forest model can also be used for the ordering of variable features,and the top ten variables in order were interleukin-17,tumor necrosis factor-α,interleukin-4,history of smoking,rheumatoid factor,anti-cyclic citrulline antibody,gender,history of methotrexate use,interferon-gamma,and interleukin-2.In clinical practice,the prediction of interstitial lung disease complicated by rheumatoid arthritis can be realized according to patients’ clinical information,laboratory indicators,medication history and other information,which can provide a basis for early intervention and treatment,so as to reduce the incidence and mortality of interstitial lung disease complicated by rheumatoid arthritis. |