Font Size: a A A

Research On Parallel Association Rules Algorithm Based On HADOOP Platform

Posted on:2018-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:L D MaFull Text:PDF
GTID:2358330515499075Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the field of data mining,because of its simple,efficient solution and wide application,the algorithm of association rules has become an important means to mining the deep value of data.In the era of big data,how to dig up the valuable knowledge from the huge amounts of data more rapidly,low costly and more efficiently,and help policymakers to make better decisions has become a new topic in the field of data mining technology.The emergence of cloud computing brings new solutions for massive data mining.Hadoop,developed by the Apache foundation,is an open source implementation of cloud computing technology,and its core technology is the Hadoop distributed file system HDFS and parallel programming framework MapReduce.On the basis of in-depth study of traditional data mining algorithms,it is hotspot in the field of data mining to how to use the improvement of traditional data mining algorithm by combining the traditional data mining algorithms with the parallel programming framework MapReduce to deal with huge amounts of data mining.Firstly,this paper researches the cloud computing,the Hadoop distributed file system HDFS and parallel programming framework MapReduce in detail.Then,the concepts of data mining and association rules are expounded,and the classical Apriori algorithm in association rule is analyzed in detail.Based on the basis of the above research content,a concrete example is given.Then Apriori algorithm is combined with Hadoop platform for parallel implementation.Then,by introducing the data structure of the matrix,an improved algorithm of association rule mining based on Hadoop and matrix is proposed according to the characteristics of the matrix and the properties of Apriori algorithm.Finally,the experimental environment of Hadoop is built,and the code of the improved algorithm is compiled and debugged by using Java programming language.The algorithm is tested by using different experimental data sets and experimental conditions.Through the comparative analysis of the experimental results,the improved algorithm has better performance.
Keywords/Search Tags:data mining, association rules, Hadoop, matrix
PDF Full Text Request
Related items