Font Size: a A A

Application Of CatBoost Algorithm In Breast Tumor Diagnosis Research

Posted on:2020-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:G L HuangFull Text:PDF
GTID:2437330575960950Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,breast cancer has been the leading cause of death among women,and the population affected by breast cancer is becoming younger and younger.Therefore,the diagnosis and treatment of breast cancer has become the top priority of medical research.The diagnosis of breast cancer is complicated and the influencing factors are various.How to improve the diagnosis efficiency of breast cancer has become an urgent problem to be solved.In view of the above situation,this paper takes 569 breast cancer data sets in UCI machine learning database as the research object,respectively adopts Support Vector Machine algorithm,Random Forest algorithm,XGBoost algorithm and CatBoost algorithm to establish breast cancer diagnosis classifier,and conducts comparative analysis of the research results to find the optimal classifier.XGBoost algorithm and CatBoost algorithm that have integrated and optimized the decision tree have good classification effect.The accuracy of CatBoost algorithm is 99%,which further improves the classification performance of the classifier,thus helping doctors to make more accurate diagnosis of the disease.The main content of the paper is as follows:(1)Fully understand the clinical diagnostic indicators of breast cancer,select appropriate attributes as the research object of this data mining,and establish the breast cancer data set.Compare and analyze the different effects of different data mining algorithms on the data set of breast tumors,and find out the most effective method.(2)Refer to a large number of literatures to find the classification algorithm suitable for breast tumor data,and introduce the selected SVM algorithm,random forest algorithm,XGBoost algorithm and CatBoost algorithm;(3)SVM algorithm,Random Forest algorithm,XGBoost algorithm and CatBoost algorithm are used in Python software to conduct simulation experiments on data sets to analyze and compare the advantages and disadvantages of the algorithm;(4)Evaluate,analyze and compare the established diagnostic classifiers to find out the optimal classifier;(5)Verify the feasibility of CatBoost model and verify the superiority of CatBoost by using heart disease data set.The experimental results show that the Catboost classifier has the best performance and 99.4%classification accuracy,while the support vector machine classifier,random forest classifier and XGBoost classifier have 94.7%,95.3%and 96.5%classification accuracy respectively.XGBoost after use Boosting algorithm integrated optimization algorithm and CatBoost algorithm is better than that of a single support vector machine(SVM)algorithm and random forest algorithm,integrated algorithm of AUC value increased 1%?4%.
Keywords/Search Tags:Breast Cancer, Support Vector Machine, XGBoost, CatBoost
PDF Full Text Request
Related items