Font Size: a A A

Research And Improvement Of Association Rules Mining Algorithm Based On Directed Graph

Posted on:2016-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LiaoFull Text:PDF
GTID:2308330503476721Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Association rule mining is an important approach in data mining. At present, the main a-ssociation rule mining algorithm include Apriori algorithm, Fp-growth algorithm, Eclat algorithm and so on. These algorithm has the following problems:(1) Apriori algorithm need scanning the database many times that leading a large of I/O operations, and generate many candidate sets;(2)For the large transaction database, the data represented by vertical data format(<item, tid-set>) of Eclat algorithm cannot stored in memory and when the number of transaction is large, the size of tid-set will be large, the cost of the intersection of tid-set will increase, thus the performance of the algorithm will be reduced;(3) For Fp-growth algorithm, frequently constructing the conditional pattern base will cost a lot of time, Especially when the support threshold is relatively small, the compression capacity of data of Fp-tree begin to drop,leading to many unnecessary construction operation.In this paper, several techniques are investigated to support efficient and scalable mining frequent itemset.At first, by analyzing many previous algorithm, propose three novel data structure:(1) PE_Graph:a directed graph for stored all tid-set of 2-frequent itemset, which can be used to mining all frequent itemset; (2) Tid-tree:a tree for coding the tid-set in smaller length;(3) MP-tree:a tree like Fp-tree for effectively pruning can reduce the unnecessary counting.At second, two novel algorithm for mining frequent itemset are proposed, namely PF_search algorithm and PR_search algorithm.The mining process of PF_search algorithm and PR_search algorithm include a lagre of redundant computations, so a new algorithm named Fp-Search algorithm with k road pruning is proposed to implement the efficient pruning, whose space complexity is O(n)(n is the total number of frequent itemsets). When the n is large, the cost of memory and the cost of time of pruning will be large.In order to solve above-mentioned drawbacks, Fpmix2_search algorithm and Fpmixk_search algorithm is proposed.Based on the fact that when k>0, the Fp-Search algorithm with k road pruning can effectively prune the redundant path and mine all frequent itemsets, the FNMP-search algorithm is proposed.Then discuss the parallel method of all above algorithm.At end, compare the performance of all above algorithm and Fp-growth algorithm. Experimental results show that Fp-Search algorithm with k road pruning, Fpmix2_search alg-orithm, Fpmixk_search algorithm and FNMP-search algorithm are more efficient than Fp-growth algorithm.The FNMP-search algorithm is the most efficient algorithm and PFNMP-search algorithm is also the efficient algorithm of all above-mentioned parallel algorithms.
Keywords/Search Tags:association rule mining, Apriori, Fp-growth, Eclat, parallel
PDF Full Text Request
Related items