Font Size: a A A

Research On Spark-based Association Rule Mining Algorithms

Posted on:2020-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Q X LuoFull Text:PDF
GTID:2428330596974936Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since entering the twenty-first century,the data in every field has grown explosively.Association rules,as a very important method of data allocation,have been studied by many scholars.However,the current mainstream algorithms for association rule mining have deadly bottlenecks and low computational efficiency.In this paper,the problem of low efficiency of association rule mining is proposed.Particle swarm optimization(PSO)algorithm is introduced,which combines with FP-Growth(Frequent Pattern Growth)algorithm.Particle optimization is used to replace the recursion of conditional tree in order to improve the efficiency of the algorithm.The improved algorithm is improved in parallel on the Spark computing platform.The specific work of this paper is as follows:(1)The improvement direction and ideas of the existing association rule mining algorithm are studied.Aiming at the need of Apriori algorithm to scan the transaction database multiple times,the rule mining time is long,and the FP-Growth algorithm has large memory consumption,a Particle Swarm Optimization Algorithm Frequent Pattern(PSO-FP)algorithm is proposed.The binary coding format is used to set a reasonable fitness function.The entire transaction data is stored in memory through the frequent tree method,and particle optimization is used instead of FP-Growth recursive iteration.Through the analysis of experimental results,PSO-FP algorithm can effectively improve the mining efficiency of association rules.(2)In view of the problem that PSO-FP can't satisfy the association rule mining of large data volume,the parallel implementation of PSO-FP algorithm is studied,and two parallelization implementation strategies are proposed.The first one is based on Parallel Particle Swarm Optimization Frequent Pattern(PPSO-FP).The second parallel strategy is based on Parallel Conditional Frequent Pattern(PCFP).Performance comparison experiments were performed by public dataset WebDoc datasets.The experimental results show that the PPSO-FP algorithm is not efficient due to the large communication overhead between clusters.However,the PCFP algorithm is obviously superior to the other parallel algorithms in the mining efficiency of association rules.In general,the association rule mining based on particle swarm optimization algorithm transforms association rule mining into multi-objective solving problem,and the algorithm efficiency is obviously improved.And the number of rule mining is more than the combination of the remaining optimization algorithm and Apriori algorithm.The PCFP algorithm also has obvious advantages in the efficiency of parallel computing.The algorithm has certain application prospects in the research field of association rule mining.
Keywords/Search Tags:Association rule mining, particle swarm optimization algorithm, Spark, FP-Growth
PDF Full Text Request
Related items