Font Size: a A A

Research On Parallelization Of Frequent Itemsets Mining Algorithm

Posted on:2020-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:X HeFull Text:PDF
GTID:2428330623959893Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the main research object in the field of data mining,association rule mining is to discover interesting related information between items in relational data sets,and the main cost of association rule mining is caused by frequent itemset mining.However,the efficiency of the traditional frequent itemset mining algorithm is not well,and with the improvement of performance of computer hardware and software,data mining based on parallel computing has become a hot topic in academia.Therefore,how to improve the traditional serial mining algorithm and how to use parallel technology to improve the performance of frequent itemset mining algorithm are studied.In practical applications,data sets are not necessarily static,so frequent itemset mining algorithms are generally divided into static mining algorithms and dynamic updating algorithms.Based on the shortcomings of the two kinds of algorithms and parallel computing technology,the serial and parallel schemes of traditional algorithms are studied,and the static mining algorithm called FP-growth and dynamic updating algorithm called FUP are emphatically studied.On the basis of the original algorithm,the improved serial mining algorithm and the parallel mining algorithm of CUDA GPU based on FPgrowth are proposed,as well as the parallel updating algorithm of CUDA GPU based on FUP.FP-growth is an efficient mining algorithm based on recursion and pattern growth.However,recursive mode can bring huge cost of time and space.Therefore,an improved non-recursive algorithm named NRFP-growth and a parallel algorithm named GPFP-growth based on FP-growth are proposed.The NRFP-grwoth algorithm introduces the data structure of FP-array to store data sets,and uses the structure of ItemPoss-map to mine frequent itemsets.The GPFP-growth algorithm is based on NRFPgrowth,and uses GPU to accelerate the process of mining frequent itemsets.FUP is a serial incremental updating algorithm of frequent itemsets.It is based on Apriori algorithm,so there are problems of generating candidate itemsets and repeatly scanning database.Therefore,a parallel algorithm based on FUP named GPFUP is proposed,which accelerates the generation and filtering of candidate itemsets by introducing the prefix tree,and completes the parallel computing task of support by introducing the BitTable in GPU.In order to test the performance of the improved algorithm,the classical serial algorithm is used as benchmark.The time and space performance of the serial improved algorithm and the acceleration and scalability of the parallel algorithm are tested for different data sets.Experiments show that the improved serial mining algorithm based on FP-growth and the parallel mining algorithm based on GPU have better performance in mining frequent itemsets.
Keywords/Search Tags:Frequent Itemset, Parallel Computing, CUDA GPU, FP-growth, FUP
PDF Full Text Request
Related items