Font Size: a A A

Study On Prediction Model Of Early Colorectal Cancer Based On Smote Algorithm

Posted on:2023-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:G X ZhangFull Text:PDF
GTID:2544307172480154Subject:Resources and environment
Abstract/Summary:PDF Full Text Request
The incidence rate of colorectal cancer ranks among the top three common malignant tumors in the world.China is a country with a high incidence of colorectal cancer,and the environment has an important impact on colorectal cancer.Its high incidence rate and mortality have made it one of the thorny problems in the global digestive and oncological fields.The vast majority of colorectal cancer patients have no specific clinical manifestations at the early stage,and the disease is already in the progressive stage when they go to the hospital.Therefore,early screening and early warning are extremely important for the prevention and treatment of colorectal cancer.Methods: The data set of colorectal cancer patients from Shanghai Renji Hospital was used for analysis.In data preprocessing,the missing and abnormal values of each feature were checked first,and the mean,median and mode were used to fill in according to the actual situation.Continuous data is transformed into dimensionless indicators,and classified data needs to be transformed into virtual variables using unique heat coding.To process the imbalance of data classification,first use the traditional sampling method to oversample the data set,and then use the smote algorithm to sample the original data set.Then use the traditional sampling balanced data set and the smote algorithm to sample the balanced data set and decision tree,logical regression,random forest,and neural network to establish eight colorectal cancer prediction models.The experimental results show that the F1 score of the random forest algorithm based on smote is0.96,the accuracy of the confusion matrix is 88.9%,and the AUC value is 0.99.Compared with other prediction models,the accuracy is the highest,and the classification performance is the best.Finally,according to the summary of the characteristics of colorectal cancer,the importance of traditional adenomas,serrated polyps,and mixed previous intestinal polyps is an important criterion for judging whether to have cancer,The patient’s age is 40-60 years old,which is the high incidence period of colorectal cancer.The extra nutrients are vitamin tablets,basically do not drink milk,often drink,often smoke,basically do not drink coffee,and often eat eggs.These daily habits also play an important role in colorectal cancer.This study establishes a model to predict the risk of colorectal cancer based on the living environment and dietary habits of patients,which can not only prevent the harm of the environment to colorectal cancer,but also improve the accuracy of early screening,laying a foundation for subsequent research.
Keywords/Search Tags:colorectal cancer, data mining, smote, oversampling
PDF Full Text Request
Related items