Font Size: a A A

CPU Parallelization And Distribution Eclat Algorithm Based On Bit Storage Type Tid

Posted on:2019-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z X SunFull Text:PDF
GTID:2428330548983458Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Frequent itemset mining(FIM)is an important step in association rule mining algorithm.The typical association rule mining algorithm includes Apriori algorithm,DHP algorithm,Toivoen algorithm,Eclat algorithm,FP-Growth algorithm and so on.These traditional association rale algorithms based on serial type show choke points in both memory consumption and digging efficiency when dealing with small data and massive data.This paper researches and improves Eclat algorithm from the aspect of CPU parallelization and distributed processing facing the current application scenarios and requirements.Eclat algorithm is an association rule mining algorithm based on a vertical Tid list.This paper analyzes the frequent item-set generation principle of Eclat algorithm and some related improved algorithm and put forward a CPU parallelized Eclat algorithm based on bit storage service structure(Bit Parallel Eclat,hereafter referred as "BPEclat")on the basis of the principle.BPEclat's transaction Tid in each project uses binary bits for storage and at the same time,parallel mining patterns are used to allocate mining tasks to each thread to maximize CPU computing performance.Experiments have shown that BPEclat algorithm improves the efficiency of frequent item-set mining and reduces memory consumption greatly.In order to make the algorithm suitable for mining of association rule in the massive data environment,the BPEclat algorithm is deeply integrated with the Spark framework,and a BPEclat algorithm based on the Spark big data processing platform(Bit Parallel Eclat based on Spark,hereafter referred as "SBPEclat")is further proposed.The SBPEclat algorithm takes full advantage of the Spark framework in iterative calculation and big data processing.At the same time,in order to balance the load of computing nodes during algorithm mining,the load balancing strategy is used in the algorithm to optimize the equivalence grouping.The experiments towards SBPEclat algorithm have shown that it(the BPEclat algorithm based on the Spark platform)has excellent performance when mining frequent itemsets of massive data.
Keywords/Search Tags:Frequent itemset mining, Bit storage, Eclat algorithm, Distributed processing, CPU parallelization
PDF Full Text Request
Related items