| Research Background:There are many different types of ovarian disease,classified as benign,borderline and malignant.One of the three major gynecological tumors is ovarian cancer,which has a high mortality rate and a low rate of early diagnosis.The gold standard for the diagnosis of ovarian disease remains surgical pathology section,and there is still a lack of clinically validated preoperative diagnostic methods for population census,let alone the prediction of histological classification of ovarian disease.Therefore,the search for minimally invasive,convenient,safe and inexpensive methods to improve the accuracy of preoperative diagnosis of patients with ovarian lesions and to predict histological classification is an important guide to the selection of clinical plans and the evaluation of prognosis.Research Objective:Three clinical prediction models and six machine learning prediction models based on clinically accessible hematological indicators were developed for preoperative assessment of ovarian disease through traditional statistical methods and multiple machine-learning algorithms;the best model was selected for further prediction of ovarian borderline tumor and accurate diagnosis of histological classification of ovarian disease,so that provided a basis for early diagnosis of ovarian tumors and further personalized treatment.Research Methods:This study is retrospective.After data cleaning,1337 patients with ovarian disease from Zhujiang hospital of Southern Medical University from January 2015 to July 2022 were included in the training and validation sets in an optimal ratio of 7:3,and another 137 patients with ovarian disease from August 2022 to January 2023 were selected for inclusion in the test set according to the principle of temporal validation.By algorithms of uni variate-stepwise Logistic regression,optimal subset-stepwise Logistic regression,Lasso-stepwise Logistic regression,random forest(RF),support vector classification(SVC),gradient boosting decision tree(GBDT),extreme gradient boosting(XGB),multilayer perceptron(MLP)and k-nearest neighbor(KN),a multi-dimensional data mining of 32 clinical indicators was conducted to select the most relevant predictors for preoperative diagnosis of the disease,construct a traditional statistical prediction model and a machine-learning prediction model,and conduct model evaluation and validation to select the optimal model for preoperative diagnosis of ovarian disease.Research Conclusion:The Lasso-Logistic-based dichotomous traditional statistics prediction model and the XGB algorithm-based dichotomous machine-learning prediction model based on serological indicators can effectively predict for benign ovarian disease and non-benign ovarian disease(AUC>0.800),and are expected to be an adjunctive method for preoperative diagnosis of ovarian disease.Tri-and multi-classification machine learning prediction models based on the XGB algorithm also have some predictive value in the preoperative diagnosis of ovarian junctional disease(AUC=0.715)and disease histological types(e.g.ovarian serous carcinoma AUC>0.900).Age,d-dimer,carbohydrate antigen 125 and fibrinogen plays important roles in differentiating ovarian disease in both machine-learning prediction models and traditional statistical prediction models. |