| Background:Breast cancer is the world’s highest incidence rate of malignant tumors,and the mortality rate ranks first in female malignant tumors.Triple negative breast cancer(TNBC)is the molecular subtype with the worst prognosis in breast cancer.Generally,the recurrence rate is higher 1-3 years after surgery,and the prognosis of patients with recurrence is worse.The median survival period is only 13.3 months.At present,there are many risk factors related to postoperative recurrence of TNBC,but there are still issues such as insufficient accuracy and poor accessibility.Therefore,selecting clinically available prognostic factors to construct a predictive model for postoperative recurrence of TNBC is of great significance for clinical physicians to evaluate the risk of postoperative recurrence and individualized management of TNBC patients.Machine Learning(ML),as a branch of artificial intelligence,builds models and learns models based on sample data.Because of its good accuracy,ML plays an important role in predicting disease prognosis,including Logistic regression algorithm,Decision Tree(DT)algorithm,random forest(RF)algorithm,Support Vector Machine(SVM)algorithm,Extreme Gradient Boosting(XGBoost)algorithms are commonly used.In addition,with the upgrading of algorithms,the interpretability of the model can gradually transition to the explanation of the importance of the "black box effect"features,rather than the specific decision-making process.Therefore,it is particularly important to construct a high predictive efficiency ML model and visually interpret the model.Objective:1.Explore the prognostic factors that affect the 3-year disease-free survival rate after TNBC surgery.2.Apply machine learning algorithms to construct a predictive model for 3-year disease-free survival rate after TNBC surgery,and explain the model and build an application program,providing clinical reference for clinicians to evaluate the risk of postoperative recurrence in TNBC patients.Methods:Retrospective collection of clinical relevant data(24 predictive factors)on TNBC patients who visited Lanzhou University First Hospital from January 1,2014 to September 1,2019.Perform a preliminary analysis on the dataset and randomly divide it into a training set and a testing set in a 7:3 ratio.Logistic regression algorithm,DT algorithm,RF algorithm,SVM algorithm and XGBoost algorithm were used in the training set to build a prediction model for the 3-year disease-free survival rate after TNBC surgery and adjust the model’s hyperparameter.After that,confusion matrix,ROC curve,area under the curve and calibration curve were used in the test set to evaluate and compare the prediction performance of the model,select the optimal model to explain the importance of SHAP model features,and build a Python Web application page based on Streamlit library.Results:1.The study ultimately included 260 TNBC patients who met the criteria.Patients were divided into a recurrence group(55 cases,21.2%)and a non-recurrence group(205 cases,78.8%)based on whether there was disease recurrence or metastasis after 3 years of follow-up.The baseline characteristics of clinical data comparison between the two groups showed statistically significant differences in Ki-67,interval between surgical treatment and postoperative adjuvant treatment,absolute values of neutrophils,lymphocytes,NLR,and triglycerides(P<0.05).2.Five ML models were constructed:Logistic regression model,DT model,RF model,SVM model,and XGBoost model.The model performance results showed that the XGBoost model had the best predictive performance,with an accuracy of 92.31%,an precision of 83.33%,a recall(sensitivity)of 71.43%,a specificity of 96.88%,an F1 score of 76.92%,a Brier score of 7.85%,and an AUC of 0.9141.3.The decision weights of various features within the XGBoost model and their impact on prediction direction were visually explained through SHAP analysis.NLR,absolute neutrophil count,Ki-67 expression status,absolute lymphocyte count,BMI,triglycerides,and chemotherapy duration have a positive impact on the 3-year diseasefree survival rate of TNBC patients after surgery,while CEA and LMR have a negative impact on the 3-year disease-free survival rate of TNBC patients after surgery.The Shap-values are 0.72,0.39,0.39,0.35,0.29,0.27,0.24,0.22,0.2 in sequence.Conclusion:Compared with machine learning algorithms such as logistic regression,DT,RF,SVM,and XGBoost,XGBoost algorithm has the best model prediction performance.Furthermore,SHAP was used to visually interpret the XGBoost model,indicating that NLR,LMR,absolute neutrophil values,Ki-67 expression status,absolute lymphocyte values,BMI,triglycerides,chemotherapy duration,and CEA are important factors affecting the 3-year disease-free survival rate of TNBC patients after surgery.Building a web application based on XGBoost’s prediction model is of great significance for clinicians to evaluate the risk of postoperative recurrence and individualized management of TNBC patients. |