Font Size: a A A

The Research And Implementation Of Parallel Association Rules Algorithm Based On Cloud Environment Data Mining

Posted on:2016-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z M XieFull Text:PDF
GTID:2428330473464839Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the computer,communications,networking and the rapid development of Web technology and the popularity of the explosive growth of data,the amount of data accumulated in all areas of society it is easy to reach GB and TB level,still to PB level.And more than 80% of the data is unstructured,difficult to directly use,into the era of big data more apparent.To the vast amounts of data quickly and efficiently dig out the potentially valuable knowledge,can take advantage of the current cloud computing has matured techniques.The emergence of cloud computing to solve the face of massive heterogeneous data when traditional data mining algorithms low efficiency of the situation,Apache Foundation,one of the top-level project is open source Hadoop cloud computing technology to MapReduce and HDFS as the key technology for massive data mining.Based on this,the Hadoop platform with traditional data mining association rules Apriori algorithm integrated authentication in the "Cloud" and "Tradition" environment of data mining algorithms efficiency changes.This paper describes the system architecture of Hadoop,and in-depth discussion and research core architecture MapReduce and HDFS operating mechanism Hadoop open-source framework,Hadoop-based systems designed with traditional data mining system combines cloud mining models.Secondly,the introduction and deployment of cloud computing platform to build and common Shell commands.Then,in-depth study of traditional association rules Apriori algorithm,and the algorithm is ported to the Hadoop platform to verify its effectiveness.In order to better play the role of a cloud platform,the introduction of the matrix concept,design new improved algorithm Apriori_MMR.Finally,the real data on the Hadoop platform validation algorithm correctness,feasibility and efficiency,through comparative analysis of the experimental results shows that,Apriori_MMR the improved algorithm performance better.In short,cloud computing improved algorithm for data mining has brought new thinking patterns,cloud mining will also be the future trend of data mining research.In this paper,the traditional data mining algorithms and cloud computing combined to provide a certain reference value for data mining algorithms to improve the other.At the same time,I also believe that in the near future there will be more algorithms are transplanted into Hadoop cloud platform up is completed.
Keywords/Search Tags:Cloud Computing, Data Mining, Hadoop, MapReduce, Association rules, Apriori Algorithm
PDF Full Text Request
Related items