Font Size: a A A

CART Algorithm Based On Feature Selection

Posted on:2021-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:R TangFull Text:PDF
GTID:2428330623967968Subject:Statistics
Abstract/Summary:PDF Full Text Request
When classification regression tree(CART)algorithm processes data with more feature categories and datasets with strong correlation,the classification accuracy also be affected,and no scholars research CART algorithm after feature selection to classify medical datasets.Meanwhile,some scholars combine bayesian theory with CART to classify datasets,mainly using prior probability and splitting criterion to regulate the size of the tree model,which has a certain influence on the accuracy in the process of classification.The traditional prior specification has some defects,and the research of these methods has become mature.On this basis,finding a suitable prior probability and splitting criterion is more conducive to improving the performance of CART classification algorithm based on bayesian theory.Therefore,this thesis further improves the classification performance of CART algorithm on the basis of feature selection algorithm to correctly classify datasets.Firstly,for breast cancer dataset with larger categories,in order to save computational cost and improve the classification accuracy of the algorithm,the classification accuracy of the CART algorithm under different feature selection algorithms is compared;By fitting the accuracy near the highest classification accuracy,it can be concluded that the accuracy of CART algorithm approximately obeys Poisson distribution near the optimal feature subset.Secondly,it is proved that entropy is similar to gini index,and entropy is used as the split criterion of the CART algorithm.According to the influence of feature number on classification accuracy,a prior probability of importance feature is proposed.Under the framework of Bayesian theory,a Bayesian CART algorithm based on the importance feature priori is presented.By proving that the maximum posterior probability corresponds to the minimum entropy,the criteria for selecting a tree are given.Finally,the classification model of Bayesian CART algorithm based on the importance feature priori was established and classified on the breast cancer dataset and hepatitis dataset.Compared with CART algorithm and CART algorithm based on feature selection,the Bayesian CART algorithm based on importance feature priori has better classification accuracy.
Keywords/Search Tags:CART algorithm, feature selection method, Bayesian CART algorithm, Prior probability of the importance feature
PDF Full Text Request
Related items