Font Size: a A A

The Process And Research Of Massive Data Mining Based On Cloud Computing

Posted on:2014-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:J C BaFull Text:PDF
GTID:2268330425993249Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
We utilized data mining techniques to rapidly extract valuable rule and pattern from massive data with noise, thus makes data easily being understood and utilized. On the other hand, based on the characteristics of low cost, high throughput, good compatibility and stability, we selected cloud computing techniques to process the massive data.In this article, we first introduced the key technology of cloud computing, data mining and Hadoop architect. Then we optimized a typical categorized algorithm in data mining, SPRINT. After deeply analyzing the key programming pattern of cloud computing, MapReduce and encapsulating such pattern with the algorithm, we provided a detailed algorithm design and implementation. After that, we successfully transplanted the algorithm to Hadoop platform to proceed distributed computing. In the end, we analyzed the advantages and disadvantages of SPRINT with experimental verifications.The experiment shows that the execution time of a circulation unit is significantly decreasing as the number of nodes in cluster increasing and implies that the algorithm could better distribute the amount of calculation to various Hadoop clusters to implement paralyzation, thus improving the scalability and reducing the execution time.
Keywords/Search Tags:Cloud computing, Data mining, SPRINT, MapReduce, HADOOP
PDF Full Text Request
Related items