| Endoscopic Retrograde Cholangiopancreatography(ERCP)is an important method for the diagnosis and treatment of biliary and pancreatic diseases,with excellent treatment effects.Although ERCP is considered a safe surgical procedure,there is still a high risk of complications.The occurrence of complications is related to many factors,and generally relies on the experience of doctors for prevention and treatment,which can lead to misdiagnosis or excessive treatment.Meanwhile,there is a phenomenon of imbalanced datasets,with a lower incidence rate in the sample and a larger proportion of healthy people,making it difficult for the machine learning classification models to identify the group of patients with the disease.Therefore,solving the problem of data imbalance and distinguishing minority samples has become a primary issue in disease prediction.This study focuses on ERCP surgery patient cases,with data sourced from patient records in the Digestive Department of Hubei Provincial People’s Hospital,containing relevant patient information and surgical operation records.The main task is to predict postoperative complications for patients who have undergone ERCP surgery based on existing data information and construct a clinical prediction model.To address the problem of imbalanced complication datasets,this paper proposes a hybrid sampling algorithm based on Neighborhood Cleaning Rule(NCL)and Conditional Tabular GAN(CTGAN).First,the patient dataset is preprocessed,and the NCL-CTGAN algorithm is used to clean up the majority class samples while expanding the minority class samples in the dataset,resulting in a balanced dataset of healthy and sick subjects.Based on the obtained balanced dataset,the Tab Net algorithm is used to construct a prediction model by adding high-risk factor features.Experiments show that the NCL-CTGAN algorithm performs better than other traditional data sampling algorithms in improving the classification performance of the prediction model after sampling the dataset.Building a classification model on the generated balanced dataset helps improve the model’s ability to identify minority class samples.Then,by concatenating the high-risk feature with the original feature and using the Tab Net model to predict postoperative complications,compared with other commonly used classification models,the results show that this approach has better predictive performance,as the deep learning model can learn deep features.The AUC and F1 scores are improved,and the recall rate is also high,indicating that the model can predict the population of patients with complications and has practical clinical application value.In addition,comparing the model without high-risk features and analyzing the importance of model features shows that incorporating high-risk features is helpful for predicting postoperative complications. |