Font Size: a A A

Class Imbalance Processing Study On Prognostic Model Of Breast Cancer

Posted on:2020-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2404330572981737Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective: To explore the survival status of unbalanced classification breast cancer data set based on machine learning to predict prognosis.In view of the prognosis of breast cancer,a stable and reliable prediction model is established.Based on this model,a relatively good prognostic prediction model is selected.The factors influencing prognosis,survival or death status of breast cancer were discussed.Different explanatory models were selected to explain the factors in many ways.Methods: The prognostic survival data of breast tumors were unbalanced.SMOTE,Borderline-SMOTE,ADASYN and One-sided selection were used to process the unbalanced survival data of breast tumors.Classical decision tree,conditional inference tree,random forest and support vector machine were used to classify prognostic states.The evaluation indicators used accuracy,sensitivity,specificity,positive and negative hit rates to evaluate the effect of classifiers;preliminary exploratory analysis of the collected breast cancer data sets,for the Logistic regression model,explain the regression coefficient and the estimates of the one-way dominance ratio,the factors corresponding to each branch of the decision tree and the probability of corresponding outcomes,random forest.Lin's method ranked the influencing factors on prognosis and survival status of breast cancer patients.Result:(1)In view of the imbalance of survival status of prognostic data sets of breast cancer patients,one-Sided select technique combined with conditional decision tree prediction has the best prognostic prediction effect in imbalanced data sets of breast cancer,which increases the sensitivity from 2% to 58%,and increases by 56%.(2)After using the step-by-step forward method in Cox analysis,the dependent variables were selected: T stage,N stage,progesterone,HER2,endocrine therapy and chemotherapy.(3)Logistic regression screened out age,N stage,endocrine therapy of breast cancer,means of chemotherapy,multifocal lesions and chemotherapy.According to the importance of random forestcharacteristic variables,age,hormone receptor expression,mass size,N stage,clinical stage and T stage variables are more important.Conclusion: The data mining method used in this paper can also be used in other types of unbalanced breast cancer data sets to explore more medical research from the occurrence and development of diseases,the curative effect of treatment methods and the influencing factors of final prognosis.
Keywords/Search Tags:Breast cancer, Machine learning, Prognosis, Prediction, Imbalance data
PDF Full Text Request
Related items