| Objective:Several machine learning algorithm-based models were developed based on the data collected from multiple medical institutions to explore the performance of machine learning models in cervical cancer adjuvant diagnosis;and to provide a reference for the development of machine learning algorithms in the field of cervical cancer adjuvant diagnosis and the feasibility of their practical clinical application.At the same time,by comparing the importance of variables in machine learning models,risk factors affecting the incidence of cervical cancer can be evaluated from another perspective,providing evidence and reference for clinical practice.Methods:Between 12 January 2018 and 30 December 2021,data related to patients with cervical lesions were collected from several medical institutions across China,mainly collecting data related to patients’ age,fertility history,gynecological examinations,etc.The models were constructed with algorithms including logistic regression,decision tree,support vector machine and na?ve Bayes.Some data were set aside as the external validation set,the remaining data were oversampled for training and internal validation of the machine learning models,and the external validation set was used for external validation.Calculating the area under the subject’s working characteristic curve,accuracy,sensitivity,specificity,positive predictive value,negative predictive value,etc.as the main metrics to assess the performance of the model.In addition,the weight ranking of each variable in models was calculated to assess the degree of contribution of each variable to the diagnosis of cervical cancer.Results:A total of 7 458 subjects were included in this study,of which 1 795 patients had a cervical pathology tissue biopsy rating of cervical intraepithelial neoplasia grade 2 or above and the number of patients with cervical pathology tissue biopsy results of cervical intraepithelial neoplasia grade 2 or below(<CIN2)was 5663.The internal validation results showed that the decision tree model had the highest area under the subject working characteristic curve(AUC)and accuracy among the models,with 0.929(95%CI:0.9180.940)and 0.849(95%CI:0.833-0.864),respectively;the support vector machine had the next highest AUC value and accuracy,with 0.914(95%CI:0.901-0.926)and 0.838(95%CI:0.822-0.855)respectively;logistic regression and naive Bayesian models had AUC values and accuracies of 0.909(95%CI:0.896-0.922),0.823(95%CI:0.806-0.840)and 0.897(95%CI:0.883-0.911),0.817(95%CI:0.799-0.834);while the AUC values and accuracy of diagnosis by colposcopists alone were 0.830(95%CI:0.810-0.849),0.833(95%CI:0.816-0.849).The external validation results showed that the naive Bayesian model had the highest AUC value of 0.874(95%CI:0.843-0.905)and its accuracy was 0.782(0.755,0.809);the logistic regression model had the highest accuracy of 0.785(95%Cl:0.758-0.812)and its AUC value was 0.862(95%CI:0.830,0.895);the decision tree model had the lowest AUC value and accuracy of 0.842(95%CI:0.808-0.876)and 0.744(95%CI:0.715-0.773),respectively;the support vector machine model had AUC values and accuracy of 0.844(95%CI:0.807,0.882),0.780(95%CI:0.753,0.807).The results of the importance evaluation of the variables showed that colposcopy results had the highest weight in all four models,and in the other three models except the logistic regression model,cytology results and HPV test results had the 2nd and 3rd highest weight,respectively,and the 2nd highest weight factor in the logistic regression model was HPV test results,the 3rd highest weight factor was whether HPV multiple infections,with the least weight given to pregnancy times,the lowest weight given to the type of transformation zone in the support vector machine model,the least weight given to the factor of whether or not menopause in the decision tree model,and the lowest weight given to the factor of whether or not multiple HPV infections in the naive Bayesian model.Conclusion:The four cervical cancer adjuvant diagnostic models in this study showed good diagnostic performance in both internal and external validation,with high performance indicators such as AUC values,sensitivity and specificity,which can provide some degree of assistance and reference for colposcopists in their diagnostic work on cervical cancer;the importance evaluation of variables showed that in addition to colposcopy,cytology and HPV testing,variables such as age and type of transformation zone also had high weighting,which can provide some reference for subsequent cervical cancer screening and diagnosis.The applicability and feasibility of the machine learning algorithm-based cervical cancer diagnosis models in clinical practice is limited by the poor interpretability of the machine learning algorithm and the small sample size of the study,so further research is needed to demonstrate its clinical application. |