Font Size: a A A

Classifier Based On The Ep Vote On The Classification Algorithm

Posted on:2005-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:M X LiuFull Text:PDF
GTID:2208360125957465Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data Mining, also known as Knowledge Discovery in Database, refers to "mining" knowledge from data in very large databases in nontrivial methods. Classification, as an important theme in data mining, has been researched earlier in statistics, machine learning, neural network, expert systems, etc. But most algorithms are memory resident, typically assuming a small data size. With the growth of data in volume and dimensionality, it is still a challenge to build effective classifiers for large databases.Methods for classification by Emerging patterns (EPs) were proposed in order to classify large dataset. EPs are new kind of knowledge pattern presented by G Dong and J. Li in 1999, which can discover the distinctions inherently between different classes of data. So EPs are useful for classification and CAEP, which has been presented by Li, Dong and Ramamohanarao in 1999, is the first application of Emerging Patterns to classification. After this, a series of EP-based classifiers were proposed one after the other such as BCEP, JEP-classifier, DeEPs, etc. It shows that EP-based classifier is better than some classic classification methods such as decision trees.Methods for voting classification algorithms, such as Bagging and Boosting, have been shown to be very successful in improving the accuracy of certain classifiers. The voting algorithms try to form a powerful classifier by combining multiple weak classifiers as a council of base-classifiers. The council is the final classifier that will be used to classify the test set. Methods for voting classification algorithms have become the best way of improving the accuracy of classifiers come from some families of algorithms such as decision trees, Nai've-Bayes and Neural Network. However, the other algorithms need more study to uncover the result.In this paper, we firstly present the idea of voting classification algorithms whose basic algorithms are EP-based. Because there is a lack of of a good EP-based classifier as the basic algorithms, we propose a new classification algorithm. It is called Classification by Essential Emerging Patterns (CEEP). Then, we propose theideas that multiple base-classifiers were generated by the way of running CEEP on different bootstrap samples and combined as a powerful classifier by voting. At last, we obtain a classifier has the superiority of EP-based classifier and voting classification algorithms. It is called Classification by Voting Classifiers based on Essential Emerging Patterns CVCEEP).The main differences between CEEP and the exiting EP-based classifiers can be described as follow: A more effective algorithm based on pattern tree (P-tree) is adopted to mine eEPs in CEEP. Differing from the existing EP-based classifiers, CEEP use a new scoring mechanism of measuring EP by its growth-rate. Moreover, CEEP can be self-adaptive to parameter. Experiment result shows that CEEP is very excellent. So CEEP is not only an important part of CVCEEP but also an independent classification algorithm based on EPs.In order to estimate the accuracy of our algorithms, our experiment study carried on 12 benchmark datasets from the UCI Machine Learning Repository shows that CEEP perform comparably with other state-of-the-art classification methods such as NB, C5.0, CBA, CMAR, CAEP and BCEP in terms of overall predictive accuracy. Furthermore, CVCEEP is much better than CEEP and can be match to the classic classification algorithms that we have known.
Keywords/Search Tags:Machine learning, data mining, classification, emerging patterns, essential emerging pattern, voting classification algorithms
PDF Full Text Request
Related items