Font Size: a A A

Research Of Parallel Frequent Itemset Mining Algorithm Based On Spark

Posted on:2019-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:F Y DuFull Text:PDF
GTID:2428330548472432Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the information society,every industry produced more and more data.It becomes more and more important to extract useful information from these irregular data for these industries.Now association rule is widely used in all walks of life and it has become a hot topic in data mining technology.At present,the most important task of data mining is how to apply association analysis more efficiently to deal with these massive data sets.For a long time,the traditional Frequent-pattern Growth algorithm is the core algorithm in the association rule technology.But when the massive data processing comes,the execution time will be too long.The best way to solve this problem is concurrent processing.This paper improves the data structure of the original FP-Growth algorithm,merges the single path that meet rules to optimize the FP-tree and deploys the improved serial algorithm to Spark clusters.Through these methods,the number of iterations will be reduced and the efficiency of the algorithm will be improving.This paper presents an improved Spark FP-Growth algorithm,called IPFP algorithm.We have designed experiments to compare the IPFP algorithm,the traditional FP-Growth algorithm in Spark platform deployment and the improved FP-growth algorithm in Hadoop platform deployment.The results of experiment show that the IPFP algorithm spends the minimum of time in different support and transaction volume.In this paper,IPFP algorithm is applied to the book recommendation of University Library.Then a library book recommendation algorithm based on IPFP is designed.We extract frequent item sets and association rules from library loan database by IPFP algorithm.We have designed experiments to compare the recommendation algorithm based on IPFP,recommendation algorithm based on collaborative filtering and the traditional recommendation algorithm based on association rules.The experimental results show that the recommendation algorithm based on IPFP is the best in accuracy and recall rate of the above three.
Keywords/Search Tags:association rule, concurrent processing, FP-Growth, Spark, book recommendation
PDF Full Text Request
Related items