| Data Mining, also known as Knowledge Discovery in Database, refers to "mining" knowledge from data in very large databases in nontrivial methods. Classification, as an important theme in data mining, has been researched earlier in statistics, machine learning, neural network, expert systems, etc. But most algorithms are confined-memory, typically assuming a small data size. With the growth of data in volume and dimensionality, it is still a challenge to build effective classifiers for large databases.Methods for classification by Emerging patterns (EPs) were proposed in order to classify large datasets. EPs are new kind of knowledge pattern presented by G. Dong and J. Li in 1999, which can discover the distinctions inherently between different classes of data. It shows that EP-based classifier is better than some classic classification methods such as decision trees. Methods for voting classification algorithms, such as Bagging and Boosting, have been shown to be very successful in improving the accuracy of certain classifiers. The voting algorithms try to form a powerful classifier by combining multiple weak classifiers as a council of base-classifiers. The council is the final classifier that will be used to classify the test set. Methods for voting classification algorithms have become the best way of improving the accuracy of classifiers come from some families of algorithms such as decision trees, Naive-Bayes and Neural Network. However, the other algorithms need more study to uncover the result.This work firstly proposes a method, called BoostEP, to improve EP-based classifiers via boosting. It uses the essential emerging patterns as based-classifiers. The essential emerging pattern is a special kind of emerging patterns, which is called eEP. The eEP not only has all the virtues of emerging patterns that are very useful for constructing accurate classifiers, but also has fewer quantities that are very efficient for mining and using them. BoostEP constructs multiple eEP-based classifiers via boosting to form an ensemble and decides the class labels of unknown samples by combining weighted predictions of these classifiers. In order to estimate the accuracy of our algorithms, our experiment study carried on 21 benchmark datasets taken from the UCI Machine Learning Repository shows that BoostEP performs comparably with other state-of-the-art classification methods such as C4.5, NB, CBA, CAEP and BaggingEP in terms of overall predictive accuracy. Comparing with BaggingEP ,our experiment also shows that Boosting can improve the performance of EP-based classifiers. |