Research On Association Rules Algorithm Based On Hadoop

Posted on:2019-10-30

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Ni

Full Text:PDF

GTID:2428330551956986

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the explosive growth of data,how to efficiently excavate effective data from large amounts of data has become one of the research hotspots in the field of big data.Data mining plays a very important role in finding the value behind the data,and association rules mining is an important research direction in data mining,which is used to discover the relevance between data.As the most core distributed platform of cloud computing,Hadoop has distributed storage and parallel computing components,which provide powerful support for the parallel design and implementation of the mining algorithms.In this paper,we study the algorithm of association rule mining based on Hadoop,and the main contents are as follows:First of all,an improved Apriori algorithm based on fp-tree is proposed to reduce the amount of data scanning in order to improve the speed of Apriori algorithm.From the angle of reducing the amount of data scanning,the improved algorithm compresses the data with fp-tree,and improves the Apriori algorithm through the methods of tail partition,dynamic reduction of data and fast support statistics.Aiming at the bottleneck that the improved algorithm can't handle big data effectively when a single machine executes,the parallel algorithm is designed and implemented under Hadoop.The experimental results show that the proposed algorithm not only has faster mining speed in single machine execution,but also has a good acceleration ratio and data scalability in the cluster environment,which can adapt to the mining of large data.Secondly,the parallelization of FP-Growth algorithm is been analyzed,and the PFP algorithm which is belong to the parallel FP-Growth is analyzed and improved.In view of the fact that the PFP algorithm does not consider the imbalance of packets in the packet stage,the overall performance is not high.A load balancing PFP algorithm is proposed.The improved algorithm constructs a new load prediction model for load estimation.The prediction model first carries out data sampling,and then weights the total number of positions in the head table and the item in the sampling data.Experimental results show that the improved load balancing PFP algorithm has higher overall mining performance and has a good speedup and data expansion rate.

Keywords/Search Tags:

Hadoop, MapReduce, Data Mining, Association Rules, Apriori, FP-Growth, Parallel Algorithm

PDF Full Text Request

Related items

1	Research On Association Rules Mining Methods Of Mass Engineering Data Based On Hadoop
2	Research On A Parallel Data Mining Algorithm Apriori
3	The Research And Implementation Of Parallel Association Rules Algorithm Based On Cloud Environment Data Mining
4	Mining Association Rules Algorithm Analysis Based On Hadoop
5	Parallel Association Rules Algorithm Based On Hadoop
6	Research Of Parallel Association Rules Algorithm Based On Hadoop
7	Research And Application Of Multidimensional Data Constructing And Association Rules Mining Algorithm Based On Mapreduce
8	Research Of Parallel Apriori Algorithm Based On MapReduce Model
9	Improvement And Parallel Processing Of Association Rules Algorithm On Data Mining
10	Research On Parallel Acceleration Algorithm Of Association Rules Based On Hadoop