Font Size: a A A

Research On Benign And Malignant Diagnosis Of Breast Cancer Based On Feature Selection

Posted on:2021-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:J HuangFull Text:PDF
GTID:2404330611481013Subject:Information processing and communication network system
Abstract/Summary:PDF Full Text Request
Breast cancer is a global health threat to women.Early screening is the key to prevent breast cancer,which can significantly reduce the death rate of breast cancer.Mammography is one of the most reliable methods for early screening for breast cancer.Radiologists are statistically only about 75 percent sensitive to breast cancer screening,but their performance improves if they are alerted to possible abnormal locations.Computer aided diagnosis system can be used as the "second assistant" of doctors to provide reference advice to doctors,which plays a very important role in the accurate diagnosis of doctors.With the advent of the era of big data,the use of machines to assist doctors in diagnosing diseases has become a popular trend.Today,machine learning technology is widely used in the classification of breast cancer.Compared with deep learning,which is prone to overfitting and poor generalization ability on small samples,machine learning has greater advantages on small sample learning.Therefore,this paper adopts the method of machine learning based on feature selection to study the classification of breast cancer.In this paper,breast cancer data were used as training samples to design and complete the breast cancer benign and malignant classification experiment.Specifically,the main work of this paper includes the following aspects:1.Considering the reality of medical data is of high dimensional nonlinear more,it is difficult to use the general linear dimension reduction method for mapping,is proposed in this paper based on the best classification principle of enhanced locally linear embedding method to map high dimensional data to low dimensional space,and does not change its original local adjacency relations.2.Aiming at the defect that every decision tree in the random forest has the same decision-making ability,a random forest algorithm based on the weight of feature confidence is proposed,and the bayesian optimization method is used to optimize the super-parameter of the model.The algorithm is applied to three benchmark data sets in UCI standard machine learning database,and the good classification effect is obtained.3.The enhanced local linear embedding algorithm and the improved random forest algorithm were applied to the breast cancer data set.In the DDSM data set,267 mammography images of mammary mammography were preprocessed,including image enhancement,image denoising,lesion region segmentation and feature extraction.The gray co-occurrence matrix was used to extract texture features of the lesion area,and the improved local linear embedding algorithm was used to reduce the feature dimension of the extracted 16 texture features to eliminate redundant information.Then the reconstructed feature sets are fed into the improved random forest classifier.Classification accuracy,classification accuracy,recall rate,f1-score and AUC were 94.01%,93.68%,94.12%,93.81% and 0.99,respectively.4.Further verify the effectiveness of the random forest model based on improved local linear embedding and Bayesian optimization and compare it with other similar algorithms.The enhanced LLE-BOARF model is used to classify and verify the WDBC and WBC data sets in the UCI machine learning database.The classification model obtained an average classification accuracy of 97.08% on the WDBC data set,and the AUC value was 0.987.The average classification accuracy of 96.68% was also obtained on the WBC data set,and the AUC value was 0.987.Compared with the similar methods,the results prove the feasibility of the proposed model.The good results of the model presented in this paper show that it is feasible to combine the nonlinear feature selection method with integrated learning method for computer-aided diagnosis,which provides a research direction for computer-aided cancer diagnosis and has certain practical significance and application value.
Keywords/Search Tags:Gray-level Co-occurrence Matrix, Locally Linear Embedding, Random Forest, Computer Aided Diagnosis, Breast Cancer
PDF Full Text Request
Related items