Font Size: a A A

Classification Of Two-stage Approach Based On Eep

Posted on:2005-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y G SunFull Text:PDF
GTID:2208360125457463Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Classification is an important data-mining task and has continued to be an important research topic in the fields of statistics, machine learning, neural networks and expert systems. It has broad applications in government organizations, scientific domains, and business corporations and so on. Techniques than learn rule-based models are especially popular in solving classification problems in data mining. However, most of the traditional rule-based classification algorithms use sequential covering technique when training classification rules which has some challenging problems hard to tackle, especially in classification for rare class. For this reason, Ramesh Agarwal and Mahesh V. Joshi proposed rule-based two-phase classification method in 2000, and it shows that two-phase method is suit for classification, especially superior to traditional algorithms as for rare class classification.A kind of novel knowledge pattern, called Emerging Pattern (EP) (Dong & Li,1999) is introduced and has been substantially studied in data mining. EP-based classification algorithms consider multi-attribute distinctions between differents datasets by aggregating the differentiating power of a collection of EPs, which make up the limitations of traditional classification methods considering one group of attributes, and arrive at satisfying results. However, the number of EPs in dense, dimensional datasets is huge which increases the time and space complexity. Recently, a new kind of EPs, named essential Emerging Pattern (eEP) (Fan & Ramamohanarao,2000) is proposed, and it can efficiently minimize the redundance of EPs without losing much useful information in classification.In this paper, we propose a novel algorithm, called CEEPTP (Classification of Essential Emerging Patterns in Two Phases), which combine the advantages of two phases framework and eEP in classification. CEEPTP mines eEPs in two phases for classification and considers modification of the second phase for the first one, which has some similarities with TPCEP. CEEPTP is different from TPCEP in that: 1) it takes a new score strategy based on growth rate of each eEP which sufficiently utilize their differentiating power, and 2) it adjusts the weight of the second phase in order to play the assistant role for remedying the result of the first one. The experiment study carried on 11 benchmark datasets from the UCI Machine Learning Repository shows that CEEPTP performs comparably with other excellent classification methods such as NB, C4.5, CBA, CMAR, CAEP, and BCEP. Moreover, CEEPTP achieves better classification accuracy than TPCEP and CEEP on many datasets. Finally, in order to show effects of modifying the role of the second phase, we compare the running results of CEEPTP and its edition before adjusted on many datasets, and the results I indicate that CEEPTP makes some improvements in classification presicion.
Keywords/Search Tags:Data Mining, Classification, Two Phase, Emerging Patterns, essential Emerging Patterns, growth rate
PDF Full Text Request
Related items