Font Size: a A A

The Research Of Parallel Association Rules Mining Algorithms Based On Cloud Platform

Posted on:2015-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:W J MaoFull Text:PDF
GTID:2268330425485466Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of the computer technology, communication technology and network technology, a large number of databases are widely used in various fields of social life. The amount of data accumulation is easy to achieve terabytes, even petabytes. These data are often noisy, large, heterogeneous and complex, so it is difficult to use them directly. So, how to dig up the valuable knowledge from the huge amounts of data more rapidly, low costly and more efficiently, and help policymakers to make better decisions has become a new topic in the field of data mining technology.The emergence of cloud computing brings new solutions for massive data mining. Hadoop, developed by the Apache foundation, is an open source implementation of cloud computing technology, and its core technology is the Hadoop distributed file system HDFS and parallel programming framework MapReduce. On the basis of in-depth study of traditional data mining algorithms, it is hotspot in the field of data mining to how to use the improvement of traditional data mining algorithm by combining the traditional data mining algorithms with the parallel programming framework MapReduce to deal with huge amounts of data mining.This thesis researches the cloud computing, the Hadoop distributed file system HDFS and parallel programming framework MapReduce in detail, and expounds the technical architecture of data mining system based on Hadoop. Then, with the further research of traditional association rule mining algorithm Apriori, the thesis gives a parallel processing strategy of Apriori algorithm and puts forward an improved parallel algorithm AprioriMR. Then, with the basis of previous research and the introduction of the concept of power set and matrix, the thesis proposes two improved association rules mining algorithm respectively, which are AprioriPMR based on Hadoop and power set, and AprioriMMR based on Hadoop and matrix. Finally, the thesis sets up the experimental environment combined with Hadoop and HBase, and completes the writing of the improved algorithms with Java, and then uses the different experimental data sets and experimental conditions to test the validity of the improved algorithm. Through the comparative analysis of experimental results, it is concluded that the improved algorithms have higher efficiency in performance.
Keywords/Search Tags:Data mining, Association rules, MapReduce, Power set, Matrix
PDF Full Text Request
Related items