Font Size: a A A

Breast Cancer Data Processing And Auxiliary Diagnostic Modeling

Posted on:2022-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:S Q LiFull Text:PDF
GTID:2504306608968739Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The incidence of Breast Cancer(BC)is increasing,and Breast cancer is no longer limited to middle-aged and elderly people,and the patients are becoming younger.Inchoate and meta phase breast cancer patient can be cured by timely detection and treatment,for patients with advanced breast cancer can prolong life and improve their living conditions.However,there is a lack of research to help doctors diagnose breast cancer effectively.Before the data diagnosis of breast cancer,the processing of breast cancer data is also crucial.It is necessary to supplement the missing values in the original data set to some extent and select suitable feature subset for auxiliary diagnostic modeling of breast cancer.To solve the problem of missing data,this study firstly analyzes the missing mechanism of original data set from the aspect of vertical and horizontal,then a complete subset of data was extracted from the original data set,and random missing ratios of 5%,10%,20%,30% and 40% were set for the complete subset of data.K-nearest Neighbor(KNN),Random Forest(RF),Multiple Imputation(MI)and Central Trend Value Imputation(CTVI)were used to estimate missing values for each feature in data sets with different missing ratios.Finally,the rank of each feature of different methods under different missing ratios is fused to select the most suitable interpolation method.In order to solve the problem of feature irrelevance and feature redundancy in breast cancer ultrasound data set,the variance method,information gain method and correlation coefficient method were used to sort the features,and the result fusion of feature allocation weight was carried out to obtain the final ranking result.Irrelevant features and redundant features are deleted according to expert knowledge,importance and Pearson correlation coefficient.Sequence forward selection is combined with Naive Bayes(NB),decision tree and RF to obtain the accuracy of different methods for different feature subsets.Finally,the optimal feature subset is selected according to the accuracy rate.Ultrasound is a reliable standard for breast cancer diagnosis,but traditional manual detection may be misdiagnosed or missed.The application of machine learning for diagnostic modeling is insufficient to assist clinical diagnosis because a single model cannot ensure its generalization.Therefore,this paper proposes an ensemble learning model based on stacking,which has a two-layer learning structure.The proposed stacking method is used to determine the classifier combination of the two-layer structure.According to the experimental results,the model proposed in this study can improve the prediction accuracy of breast cancer to a certain extent,and can provide theoretical help for doctors to diagnose breast cancer.
Keywords/Search Tags:Missing value interpolation, Feature selection, Breast cancer(BC), Stacking model, Integrated learning
PDF Full Text Request
Related items