Font Size: a A A

Research On Classification Algorithm Using Emerging Patterns

Posted on:2009-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:D ShanFull Text:PDF
GTID:2178360242974803Subject:Software and theory
Abstract/Summary:PDF Full Text Request
Data Mining is the theory and method on researching how to "mining" knowledge from data in very large databases in nontrivial methods. Classification, as an important theme in data mining, has been researched earlier in statistics, machine learning, neural network, expert systems, etc. But most algorithms are memory resident, typically assuming a small data size. With the growth of data in volume and dimensionality, it is still a challenge to build effective classifiers for large databases. Methods for classification by Emerging patterns(EPs) were proposed in order to classify large dataset. EPs are new kind of knowledge pattern presented by G. Dong and J. Li in 1999, which can discover the distinctions inherently between different classes of data.In this paper, we firstly introduce the concept and basic technology about classification. Then detailedly present the basic concept about EPs and efficient mining algorithm of border's operation named MBD-LLBORDER and max-patterns algorithm named FP-MAX. Then briefly expounds the idea of classification algorithms whose basic algorithm are EPs-based. Finally based on the algorithms forenamed, it analyzes the classification algorithm of CAEP and it's realization, and realizes an improvement on CAEP. We propose a new classification algorithm. It is called Classfication by essential Emerging Patterns (CeEP). The algorithm adopts a special EPs refered by Fan and Ramamohanarao in 2003. Differing from the existing EP-based classifiers, CeEP use a new scoring mechanism of measuring EP by its growth-rate. Moreover, CeEP can be self-adaptive to parameter.In order to estimate the accuracy of our algorithms, our experiment study carried on 22 benchmark datasets from the UCI Machine Learning Repository shows that CeEP perform comparably with other state-of-the-art classification methods such as NB, C4.5, TAN and CAEP. Experiment result shows that CeEP is very excellent.
Keywords/Search Tags:data mining, classification, pattern, emerging patterns
PDF Full Text Request
Related items