Font Size: a A A

Research And Application Of Breast Cancer Prediction Model Based On Cost-Sensitive Learning

Posted on:2020-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2404330578467295Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Breast cancer is the leading malignant tumor in the world,and it has a great impact on female's health,national economy and social development.It has become a major public health problem in current society.The incidence of breast cancer is concealed,and the exact cause is not completely clear.It is generally believed to be related to a variety of internal and external factors such as heredity,fertility,and behavioral habits.Early breast cancer has a relatively high cure rate through standardized treatment.Although there are some methods for prolonging the survival of patients with terminal breast cancer,it is difficult to completely cure.Due to the lack of health awareness of early detection,early diagnosis and early treatment,and the fact that China has a large population and limited medical and health resources,it is difficult to achieve breast cancer screening across the country.Therefore,it is particularly important to carry out research on breast cancer prediction models and timely screen out high-risk groups of breast cancer.The subject introduces a cost-sensitive learning algorithm to study breast cancer prediction models,in order to screen out high-risk populations and achieve the purpose of breast cancer assisted detection.The main work of this paper is as follows:(1)Data Analysis.The project has collected and compiled clinical data on 1031 breast cancer patients.This paper analyzes the independence between quantitative factors in the training data(data set 1)and the differences between the disease groups in the case group and the control group.Statistically significant influencing factors were included in breast cancer risk factors and a new experimental data set(data set 2)was established.(2)A breast cancer prediction model based on threshold-optimized Logistic regression was proposed for the imbalance problem of experimental data,the experiment used the method of optimizing classification threshold to improve the prediction performance of Logistic regression model.In this paper,Logistic regression models are constructed for data set 1 and data set 2 respectively,and the effect of changing thresholds on model performance is evaluated by the precision-recall curve.Experiments show that when the threshold of Logistic regression model constructed by data set 2 is 0.031,the model Main_Logistic_Model has the highest prediction performance,the model AUC value is 75.08%,and the sensitivity is 71.43%.(3)A breast cancer prediction model based on cost-sensitive decision tree C5.0 is proposed.In the imbalance problem,different classification errors lead to different classification costs.The experiment optimizes the decision tree C5.0 prediction model constructed by data set 1 and data set 2 by introducing the cost matrix method.Experiments show that the decision tree C5.0 model constructed by data set 1 has the best predictive performance of the model C5.0_Model when the c(A)/c(B)ratio is 18/1,and its AUC value is 89.37%.Sensitivity is as high as 100.00%.(4)The breast cancer prediction system was designed and implemented.According to the two different classifiers proposed in this paper,a reasonable system algorithm is developed to design and implement the breast cancer prediction system.The system includes a client and an administrator.By collecting user-related indicators,the client can predict the risk of breast cancer in female,so can screen out high-risk groups effectively,which has positive significance for the assisted detection of breast cancer.
Keywords/Search Tags:breast cancer, cost-sensitive learning, Logistic regression, decision tree C5.0, prediction model
PDF Full Text Request
Related items