Font Size: a A A

Research Of Parallel Apriori Algorithm Based On MapReduce Model

Posted on:2019-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:S H ZhangFull Text:PDF
GTID:2428330578468416Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
Nowadays,with the wide application of the Internet at home and abroad,the society has been developed rapidly.Enterprises and smart terminals generate huge amounts of production data and social networking data every day.Traditional data mining algorithms are incapable of dealing with large-scale data.As a result,more and more scholars are dovating to the investment of the parallelization of data mining algorithms.Based on the deep study of data mining theory and MapReduce parallel computing framework,the traditional Apriori algorithm is improved.The main work of this thesis includes:(1)Because of the traditional Apriori algorithm's bad performance of dealing with larger data,MRApriori(MapReduce Apriori)algorithm is put forward using the MapReduce model based on transaction database partitioning.Firstly,the local frequent itemsets on each child node in the cluster are obtained.Then all the local frequent item sets are combined as a global candidate item set.Finally,the frequent item sets satisfying the conditions are filtered according to the minimum support degree threshold.The improved algorithm only needs to scan the transaction database twice and computes frequent item sets in parallel,improving the execution efficiency of the algorithm.(2)Because of the candidate sets are still generated by serial self-connection of frequent item sets,MRApriori algorithm produces a large number of candidate centralized data.MRSApriori(MapReduce Split Apriori)algorithm,a parallel Apriori algorithm,is put forward based on grouping statistics strategy.This algorithm generates the candidate(k+1)itemsets from frequent k itemsets,which makes the whole process of generating frequent item sets in parallel.The algorithm reduces the number of candidate sets during the iteration,improving the efficiency of mining frequent itemsets.(3)A data mining system based on MRSApriori algorithm is designed and built.The system adopts the classical B/S architecture to realize the parallel association rule mining of the shopping basket data set of a large supermarket,and can excavate purchase rule of the commodity which is often bought together.
Keywords/Search Tags:Data mining, Apriori algorithm, Association rules, MapReduce model
PDF Full Text Request
Related items