Font Size: a A A

Based Eep Rare Class Classification Problem

Posted on:2006-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiuFull Text:PDF
GTID:2208360155469500Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The problem of classifying rare class is very important in many practical applications. The scarcity of target class instances makes it difficult to classify them correctly by using many traditional classifiers. As this problem is exceptional and complicated, few special algorithms exist for the rare-class classification.This dissertation does some researches on the eEP-based classifiers for the rare class problem. The essential emerging pattern is a special kind of emerging patterns, which is called eEP. The eEP not only has all the virtues of emerging patterns that are very useful for constructing accurate classifiers, but also has fewer quantities that are very efficient for mining and using them.The ensemble learning method comes from machine learning fields, which is one of the most effective learning methods for the last ten years and can improve the predictive accuracy of weak classifiers. Compared with the single classifier, it arouses few overfitting phenomena.In this dissertation, the author applies the bagging technique to the challenging problem of rare-class classification and uses the eEP-based classifier as the base classifier of ensemble learning model. With the "group bootstrap" and two different weighted-vote strategies, the author has studied the application of bagging to rare class problem carefully and proposed some new efficient ensemble learning algorithms.The innovations of this dissertation are as follows: It proposes a new method of using essential emerging patterns for classifying rare class. It has improved the single classifier of CEEP and designed a new algorithm called eEPRC. And the eEPRC classifier is much more suitable for rare-class classification problem. It applies the bagging ensemble learning technique to improve the predictive performance of rare class. Two new rare-class classification algorithms areproposed, which are VeEPRC and BeEPRC. They differ in the sample method. VeEPRC ensemble classifier uses the bootstrap, whereas BeEPRC ensemble classifier uses the "group bootstrap". As to BeEPRC algorithm, the author has studied two weighted-vote strategies that are voting by the accuracy and by the rare class's F-measure value. After the particular analyses, the second vote strategy is selected and the adjusted algorithm is called BeEPRCF that has better classification performance. The comparisons of experiment results are between BeEPRCF, the representative algorithm of this dissertation, and the other classical algorithms such as NB, C5.0, CMAR and CAEP.In this dissertation, the author has explored some feasible rules through the studies and the practices on the rare-class classification problem. It has not only improved the predictive power of rare-class to some extent, but also got the very high overall accuracy. This has put forward a new view for the study on the rare class problem, and provided plenty of experiment data for future research work.
Keywords/Search Tags:Data Mining, Classification, Rare Class, Emerging Patterns, Ensemble Learning, Bagging
PDF Full Text Request
Related items