Font Size: a A A

Basic Revealed A Pattern Mining Algorithm

Posted on:2006-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:F WeiFull Text:PDF
GTID:2208360155969211Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data Mining, is known as a useful technology to find valuable information which is potential in very large databases.Classification, as an important theme in data mining, has been widely used in many fields, such as government organization, scientific research, business corporation, and so on. Many scholars who work at statistics, machine learning, neural network, expert systems etc provide a lot of algorithms. But most of them are only used in a small data size. Methods for classification by Emerging patterns (EPs) were proposed in order to classify large datasets.Emerging Patterns (EPs) are itemsets whose supports change significantly from one data class to another. They can serve as a good classification model because they represent knowledge which discriminates between different classes of datasets. CAEP, which has been provided by G. Dong and J. Li in 1999, is the first application of Emerging Patterns to classification. After that, a series of EPs-based classifiers were proposed such as JEP-classifier, and DeEPs. But the number of EPs used in classification always is large, we can't choose every one. Recently, a special kind of EPs. named essential Emerging Patterns (eEPs), is used in classification by Fan and Ramamohanarao. Based on these new patterns, they build a Bayes Classifier which has an excellent performance.Following that, how to mine effectively the eEPs become an important issue. eEP is the "shortest" EP, and it removes the redundant EPs in classification. By the border-based representation, eEPs are just the sets of left border of EP. So we can use the border-based algorithms which used to get EPs to mine eEPs. However, the method is ineffective. In order to get the EPs of C class, we must firstly mine the max-patterns of C class and non-C class. But, the speed for mining max-patterns is slow. Besides, the border-based algorithms can't get support and growthrate which are useful to classification in the process of mining EPs. In order to get support and growthrate of EPs, we need to scan the dataset for the second time.This paper presents a novel algorithm named eEPMiner which bases on pattern tree (P-tree). The algorithm uses the pattern fragment growth strategy, and only needs to scan dataset twice to get eEPs with their support and growthrate. Moreover, the pattern tree not only stores all items in dataset but also supports class information. We can mine all eEPs in pattern tree without additional space. The experiment study carried on benchmark datasets from the UCI Machine Learning Repository shows that eEPMiner performs very well, and is much faster than border-based algorithms.
Keywords/Search Tags:Data mining, Classification, Emerging pattern, Essential Emerging pattern
PDF Full Text Request
Related items