Class Imbalance Processing Study On Prognostic Model Of Breast Cancer

Posted on:2020-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:Z Wang

Full Text:PDF

GTID:2404330572981737

Subject:Epidemiology and Health Statistics

Abstract/Summary:

PDF Full Text Request

Objective: To explore the survival status of unbalanced classification breast cancer data set based on machine learning to predict prognosis.In view of the prognosis of breast cancer,a stable and reliable prediction model is established.Based on this model,a relatively good prognostic prediction model is selected.The factors influencing prognosis,survival or death status of breast cancer were discussed.Different explanatory models were selected to explain the factors in many ways.Methods: The prognostic survival data of breast tumors were unbalanced.SMOTE,Borderline-SMOTE,ADASYN and One-sided selection were used to process the unbalanced survival data of breast tumors.Classical decision tree,conditional inference tree,random forest and support vector machine were used to classify prognostic states.The evaluation indicators used accuracy,sensitivity,specificity,positive and negative hit rates to evaluate the effect of classifiers;preliminary exploratory analysis of the collected breast cancer data sets,for the Logistic regression model,explain the regression coefficient and the estimates of the one-way dominance ratio,the factors corresponding to each branch of the decision tree and the probability of corresponding outcomes,random forest.Lin's method ranked the influencing factors on prognosis and survival status of breast cancer patients.Result:(1)In view of the imbalance of survival status of prognostic data sets of breast cancer patients,one-Sided select technique combined with conditional decision tree prediction has the best prognostic prediction effect in imbalanced data sets of breast cancer,which increases the sensitivity from 2% to 58%,and increases by 56%.(2)After using the step-by-step forward method in Cox analysis,the dependent variables were selected: T stage,N stage,progesterone,HER2,endocrine therapy and chemotherapy.(3)Logistic regression screened out age,N stage,endocrine therapy of breast cancer,means of chemotherapy,multifocal lesions and chemotherapy.According to the importance of random forestcharacteristic variables,age,hormone receptor expression,mass size,N stage,clinical stage and T stage variables are more important.Conclusion: The data mining method used in this paper can also be used in other types of unbalanced breast cancer data sets to explore more medical research from the occurrence and development of diseases,the curative effect of treatment methods and the influencing factors of final prognosis.

Keywords/Search Tags:

Breast cancer, Machine learning, Prognosis, Prediction, Imbalance data

PDF Full Text Request

Related items

1	Breast Cancer Analysis And Predictive Diagnosis Based On Data Mining
2	Breast Cancer Risk Prediction Based On Apache Spark
3	Study Of GBM Prognosis Prediction Methods Based On Multi-modal Machine Learning
4	Survival Prediction Analysis Of Breast Cancer Patients Oriented To Unbalanced Data
5	Expression Of VWCE,DPT,SCUBE3 And Prediction Of Clinical Prognosis In Breast Cancer
6	Application Of Machine Learning And Image Processing For Breast Cancer Risk Prediction And Diagnosis
7	Research On The Method Of Breast Cancer Recognition Based On Machine Learning
8	Breast Cancer Analysis And Prediction Based On Machine Learning
9	Research On Data Mining Of Blood Glucose Spectrum Based On Machine Learning
10	Research On The Prediction Of Drug Targets Based On Imbalance Data Mining