Font Size: a A A

Strong Jumping Emerging Patterns Mining Algorithm And Application

Posted on:2012-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:L J LuFull Text:PDF
GTID:2248330395985186Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Classification algorithms are the most important problem in data mining, whichhave been widely studied in neural network, statistics and machine learning, however,most of them are only suitable for small dataset. In recent years, as a kind of novelknowledge pattern, strong jumping emerging patterns (SJEP) have been presented,which have a strong ability to distinguish differences between two data sets. In orderto classify large datasets, classification algorithm based on SJEP has been proposed,which has a good classification accuracy. In this paper, we mainly study theclassification algorithm based on SJEP, the main contents and contributions are asfollowing:(1)In order to solve the redundancy problem of mining SJEP algorithm based onSJEP-treeļ¼Œ we propose an improved algorithm of mining SJEP based on sortedSJEP-tree. The algorithm sets a tag field in the header table, which can effectivelyfilter out a large number of redundant JEP. At the same time, it copies correspondingserial number of each item in the header table into the nodes of tree, simplified thecomplexity of constructing the sorted SJEP-tree and suffix sub-tree. In order to reducethe comparison times of JEP, it uses adjacent table to store SJEP. Experiment resultsshow that the operating efficiency of our algorithm is superior to the SJEP-treemining algorithm.(2)In order to solve the redundancy problem in SJEP-tree mining algorithm, andthe frequently merging sub-tree problem in P-tree mining algorithm, we propose anovel SJEP mining algorithm based on sub pattern-tree (SP-tree). In order to reducethe frequency of merging sub-tree, the algorithm first statistics the number ofdifferent nodes in horizontal list of each item, and then determines whether we needto merge sub-tree or not. In order to prune redundant branches in the operations ofmerging sub-tree, we set the value of nodes in sub-tree dynamically. Experimentresults show that the running time of mining SJEP based on SP-tree mining algorithmis less than the SJEP-tree mining algorithm;(3)In order to evaluate the performance of classification algorithm based onSJEP, we do classification experiment with stratified ten-fold cross-validationmethod. Experiment results show that for the same dataset, if the minimum supportthreshold has different values, the algorithm will get different accuracy. So if we select the minimum threshold appropriately, we can get good classification accuracywith small number of SJEP.(4) We analyze and compare the time performance between STSJEP-treealgorithm and SP-tree mining algorithm. Experiment results show that for the samedata set and minimum support threshold, the time performance of SP-tree miningalgorithm is superior to the STSJEP-tree mining algorithm.
Keywords/Search Tags:Data Mining, Classification Algorithm, Emerging Patterns, JumpingEmerging Patterns, Strong Jumping Emerging Patterns
PDF Full Text Request
Related items