Font Size: a A A

Parallel Association Rules Algorithm Based On Hadoop Platform

Posted on:2016-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y X YangFull Text:PDF
GTID:2308330461986676Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Association rule mining is an important field of data mining, wh ich has been widely applied. With the rapid development of society, people’s liv ing standards become higher and higher and their activities have become increas ingly frequent, the amount of data generated by more and more big, some even with PB or TB level of growth. In the face of such a huge task, serial algorit hm existing(Apriori)or traditional parallel algorithm(CD,DD)based on MPI or other programming model have not qualified.Hadoop platform which was propos ed by Google in 2004, not only to solve the traditional programming models su ch as MPI can’t solve the problems of node failure, but also has good scalabili ty, dynamic load balance, so the research on algorithm of imminent parallel ass ociation rule mining based on the platform. The following will introduce the m ain work of this dissertation.1)This dissertation proves from the theory that DHP, Eclat, FP_Growth algo rithm can be parallelized improvement based on the Hadoop platform(see chapt er third).2)Based on the method of generation frequent item sets and hash table, thi s dissertation put forward the improved parallel strategy based on the Hadoop p latform. Then we get the H_DHP algorithm and realize it. At the same time, w ith the Hbase database, this dissertation will story the frequent item sets into it to increase the efficiency of association rules. Then this dissertation will carry on the contrast test of DHP and H_DHP algorithm from three aspects such as r unning time,speedup and scalability(see Chapter fourth).3)Based on the characteristics of vertical distribution of data, for the algorithm of Eclat, this dissertation put forward the improved parallel strategy which i s called H_Eclat algorithm based on the Hadoop platform,and then realize it.At the same time, with the Hbase database, this dissertation will story the frequent item sets into it to increase the efficiency of association rules. Then th is dissertation will carry on the contrast test of Eclat and H_Eclat algorithm fro m three aspects such as running time, speedup and scalability(see Chapter fift h).4)In view of the FP_Growth algorithm does not need to generate candidate item sets, and generates frequent item sets by constructing the non-interference of the growing, this dissertation put forward the improved parallel strategy bas ed on Hadoop platform, the we get the H_FP_Growth algorithm. At the same ti me, with the Hbase database, this dissertation will story the frequent item sets i nto it to increase the efficiency of association rules. Then this dissertation will carry on the contrast test of FP_Growth and H_FP_Growth algorithm from three aspects such as running time,speed up and scalability(see Chapter sixth).
Keywords/Search Tags:Hadoop, H_DHP algorithm, H_Eclat algorithm, H_FP_Gro wth algorithm, Association rules
PDF Full Text Request
Related items