Font Size: a A A

Classification, Based On Eep Two-stage Approach

Posted on:2004-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:W M ZhiFull Text:PDF
GTID:2208360095450170Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data Mining, also known as Knowledge Discovery in Database, is finding information in very large database for decision maker to make decision. Classification, as an important theme in data mining, has been researched earlier in statistics, machine learning, neural network, expert systems and etc. But most algorithms are memory resident, typically assuming a small data size. With the growth of data in volume and dimensionality, it has become a very challenging problem to build a high-efficient classifier for large databases.Traditional rule-based classifiers train rules by using sequential covering technique, but the technique can make the models cover many examples of non-target class (negative examples) and fail to classify rare class. Motivated by this, Ramesh Agarwal and Mahesh V.Joshi presented a new framework for classification named Two-Phase Rule Induction The experiment results tell us that Two-Phase Rule Induction can get good result when classify rare class.Emerging Patterns (EPs) are a new knowledge patterns (attributes) and they can capture multi-attribute differences between data classes, so it can be used as the basic means for classification. Some EPs-based algorithms have been built and they got better results, but for some datasets there are so many EPs and some of them are not useful for classification, so Fan proposes a special type of EPs-Essential Merging Patterns (eEPs), which are believed to be the most useful patterns for classification.In the paper we propose a novel approach, TPeEP, which can be looked as a hybrid of eEPs-based classifier and Two Phase classifier. In TPeEP we use two phases to learn eEPs and use the second phase to correct the error of the first phase. When classifying we use all the eEPs of the two phases and think of the correction of the second phase. In TPeEP we define two scoring methods: Scoring of the Examples and eEP Covering. Based on two phases and one phase we use the two scoring methods to do an experiment. Using UCI Machine Learning Repository as experimental dataset, the results prove that the number of eEPs we get is much less than the EPs we get and two phases can correct the error well. We compare the experimental results with NB, C5.0, CAEP, LB and BCEP, the results prove that our method are excellent as those state-of-art on accuracy.
Keywords/Search Tags:Data Mining, Classification, Target class, Non-Target class, Emerging Patterns, Essential Emerging Patterns, Two Phase, Border, Similarity Rate
PDF Full Text Request
Related items