Font Size: a A A

The Research Of Data Mining Based On Clodd Platform

Posted on:2017-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:M J ChenFull Text:PDF
GTID:2308330509950228Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Living in the era of information explosion, the social network produces all kinds of digital image data, video, Internet blog, online communities, etc.. But as the data source more diverse types of data, the explosive growth of data, traditional data mining can’t meet our requirements, the need to establish a new and effective mechanism to achieve massive data analysis and processing. The birth of Cloud computing platform based on data mining because of its massive storage space, it is easy to store large amounts of data while a high scalability enables programmers to build scalable applications seamlessly on their services. Therefore, if transform traditional data mining algorithms and deploy onto the cloud platform, we can solve the problem of processing massive data difficultly.However, the deployment of traditional classical algorithm onto cloud computing platform architecture, will encounter many problems.(1)the iteration dependence of algorithm is the biggest bottleneck;(2)the communication costs of loop through the big data generate heavy loads;(3)the time of traditional algorithms handle massive data is slow, the I/O cost and network cost are very expensive.According to the traditional data mining technology encountering the problems in the cloud computing platform, This paper first describes the related theory of the cloud computing platform and the data mining, then analyses the related technology of cloud computing, MapReduce principle of Parallel Programming Model, Hadoop Distributed File System and data mining based on cloud computing platform of Hadoop framework, proposing the data mining processing mechanism to solve the problem of large-scale data processing and obtain high performance; Secondly, optimize the K-Means algorithm under data mining based on cloud computing, which eliminates the dependence of iteration and reduces the computation cost of algorithms; Finally, port the K-Means algorithm optimized to Hadoop platform to test, Through the Hadoop cluster call to the K-Means algorithm of MapReduce programming model, observe the effectiveness of date Mining Algorithm Based on data cloud platform.
Keywords/Search Tags:Cloud computing, Data Mining, MapReduce, HDFS, Clusters, K-Means
PDF Full Text Request
Related items