Font Size: a A A

The Research And Implementation On Association Rule Mining Algorithm Based On Spark

Posted on:2018-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:H HeFull Text:PDF
GTID:2348330518496430Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining is. the process to mine the underlying pattern or knowledge among dataset. Of which , Association rule mining has been a heated topic which has been widely studied around the world. Association rule mining is to reveal the potential relation among multi variables or objects, which is composed of frequent itemset mining and rule generation. As one of the two steps, frequent itemset mining consumes so much time and space resource that make it a bottle-neck when mining association rules. Therefore, many researchers has been proposing new association rule mining algorithms or making adaptation and optimization of some existing algorithms since Agrawal proposed Apriori. While with the era of big data comes, traditional stand-alone and sequential algorithms cannot deal with the exponentially increasing data scale.Under this background, there are many researchers devoted to parallel association rule mining. However, these algorithms cannot cope with the communication issues among cluster and scale up not well without the support of well-performed distributed computing platform. Fortunately,many excellent distributed computing platforms represented by Hadoop and Spark is brought out recently. Among which, Spark outperforms Hadoop greatly in iterative computing which make it an opportunity for promoting the efficiency of association rule mining.This paper firstly carried out a series study of existing association rule mining algorithms,meanwhile summarize and integrate different adaptation and parallelizing strategies that have been proposed in order to respectively implement an optimization of Apriori, FP-growth, Eclat and so on based on Spark and illustrate the efficiency promotion with the comparison corresponding to the Hadoop version. Secondly, based on the study of traditional association rule mining, this paper further study the hight utility itemset mining, and implement parallel high utility itemset mining algorithm based on Spark. Finally, to deal with the extreme large dataset when mine high utility itemset ,this paper proposed a sample strategy to combine with the parallel high utility itemset algorithm.This paper fully studied association rule mining algorithms and distributed computing platform especially Spark and verify the sensibility of this paper through sufficient experiments.
Keywords/Search Tags:association rule mining, parallel computing, Spark, high utility itemset
PDF Full Text Request
Related items