Boosting In The Ep-based Classifiers To Improve Classification Accuracy Rate

Posted on:2008-10-28

Degree:Master

Type:Thesis

Country:China

Candidate:H W Ren

Full Text:PDF

GTID:2208360215461513

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data Mining, also known as Knowledge Discovery in Database, refers to "mining" knowledge from data in very large databases in nontrivial methods. Classification, as an important theme in data mining, has been researched earlier in statistics, machine learning, neural network, expert systems, etc. But most algorithms are confined-memory, typically assuming a small data size. With the growth of data in volume and dimensionality, it is still a challenge to build effective classifiers for large databases.Methods for classification by Emerging patterns (EPs) were proposed in order to classify large datasets. EPs are new kind of knowledge pattern presented by G. Dong and J. Li in 1999, which can discover the distinctions inherently between different classes of data. It shows that EP-based classifier is better than some classic classification methods such as decision trees. Methods for voting classification algorithms, such as Bagging and Boosting, have been shown to be very successful in improving the accuracy of certain classifiers. The voting algorithms try to form a powerful classifier by combining multiple weak classifiers as a council of base-classifiers. The council is the final classifier that will be used to classify the test set. Methods for voting classification algorithms have become the best way of improving the accuracy of classifiers come from some families of algorithms such as decision trees, Naive-Bayes and Neural Network. However, the other algorithms need more study to uncover the result.This work firstly proposes a method, called BoostEP, to improve EP-based classifiers via boosting. It uses the essential emerging patterns as based-classifiers. The essential emerging pattern is a special kind of emerging patterns, which is called eEP. The eEP not only has all the virtues of emerging patterns that are very useful for constructing accurate classifiers, but also has fewer quantities that are very efficient for mining and using them. BoostEP constructs multiple eEP-based classifiers via boosting to form an ensemble and decides the class labels of unknown samples by combining weighted predictions of these classifiers. In order to estimate the accuracy of our algorithms, our experiment study carried on 21 benchmark datasets taken from the UCI Machine Learning Repository shows that BoostEP performs comparably with other state-of-the-art classification methods such as C4.5, NB, CBA, CAEP and BaggingEP in terms of overall predictive accuracy. Comparing with BaggingEP ,our experiment also shows that Boosting can improve the performance of EP-based classifiers.

Keywords/Search Tags:

Machine learning, data mining, classification, Emerging pattern, Essential Emerging pattern, voting classification algorithm

PDF Full Text Request

Related items

1	Classifier Based On The Ep Vote On The Classification Algorithm
2	Basic Revealed A Pattern Mining Algorithm
3	Training The Classification Algorithm Based On Ep
4	Research On Classification Algorithm Using Emerging Patterns
5	Classification Of Two-stage Approach Based On Eep
6	Research On Bayesian Classification Algorithm Based On Emerging Pattern For Variable Data Stream
7	Research On Emerging Pattern Classification Based On Lazy Learning
8	Mining Method And Application Of Strong Jumping Emerging Patterns Based On NSJEP-list
9	Classification, Based On Eep Two-stage Approach
10	The Algorithm Research Of Mining Shared Emerging Patterns