Font Size: a A A

Research On Parallel Acceleration Algorithm Of Association Rules Based On Hadoop

Posted on:2020-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChengFull Text:PDF
GTID:2428330590495542Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network information technology,the amount of data presents explosive growth,which poses a severe challenge to data mining technology.Traditional data mining technology presents a series of problems,such as inefficiency,even unable to complete.The emergence of big data and cloud computing technology provides a good solution to the above problems.Its distributed storage and computing model effectively solves many problems such as large memory demand and more disk I/O.Association rule algorithms is one of the most classical and mature algorithms in data mining.Its main function is to find the relationship between items and items from the related data sets.Based on Hadoop,the classical association rule algorithms Apriori and Fp-Growth are improved in parallel.The main research contents are as follows:The defects of Apriori algorithm in generating a large number of candidate itemsets,scanning transaction sets multiple times and consuming a large amount of time are improved.The pruning strategy is applied in MapReduce programming model to improve the original Apriori algorithm,effectively reducing the computational complexity.On this basis,HBase is introduced to continue to improve MR-Apriori algorithm,effectively improving data access efficiency.Fp-Growth algorithm is an optimization of Apriori algorithm,which effectively solves the drawbacks of Apriori algorithm such as generating a large number of candidate itemsets and scanning transaction sets many times.However,Fp-Growth algorithm still has a series of problems,such as large memory consumption and long computing time,when it carries out massive data mining and low minimum support.In this paper,based on the effective pruning of Fp-Tree using merge pruning strategy,Fp-Growth algorithm is parallelized based on Hadoop,and load balancing is realized by dynamic grouping method.The HDGFP algorithm is proposed.In Hadoop cluster,the improved algorithm is compared and analyzed.The experimental results show that the improved Apriori algorithm and Fp-Growth algorithm based on Hadoop have higher efficiency and good scalability.Although Fp-Growth algorithm is higher than Apriori algorithm in efficiency,it will fail because of excessive memory consumption when its support is low,while Apriori algorithm does not.
Keywords/Search Tags:Hadoop, HBase, Combined Pruning, Dynamic Grouping, Apriori, Fp-Growth
PDF Full Text Request
Related items