The Process And Research Of Massive Data Mining Based On Cloud Computing

Posted on:2014-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:J C Ba

Full Text:PDF

GTID:2268330425993249

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

We utilized data mining techniques to rapidly extract valuable rule and pattern from massive data with noise, thus makes data easily being understood and utilized. On the other hand, based on the characteristics of low cost, high throughput, good compatibility and stability, we selected cloud computing techniques to process the massive data.In this article, we first introduced the key technology of cloud computing, data mining and Hadoop architect. Then we optimized a typical categorized algorithm in data mining, SPRINT. After deeply analyzing the key programming pattern of cloud computing, MapReduce and encapsulating such pattern with the algorithm, we provided a detailed algorithm design and implementation. After that, we successfully transplanted the algorithm to Hadoop platform to proceed distributed computing. In the end, we analyzed the advantages and disadvantages of SPRINT with experimental verifications.The experiment shows that the execution time of a circulation unit is significantly decreasing as the number of nodes in cluster increasing and implies that the algorithm could better distribute the amount of calculation to various Hadoop clusters to implement paralyzation, thus improving the scalability and reducing the execution time.

Keywords/Search Tags:

Cloud computing, Data mining, SPRINT, MapReduce, HADOOP

PDF Full Text Request

Related items

1	The Reseach Of Data Mining Based On HADOOP
2	Data Mining Based On Hadoop Platform
3	Research On Massive Digital Image Data Mining Based On Hadoop Cloud Platform
4	Parallel Data Mining Algorithm Research In Cloud
5	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
6	Based On The Parallel Implementation Of Multi-node Data Mining Algorithm
7	Research Of Massive Data Processing And Mining In Database Marketing Based On Hadoop
8	Research On Decision Tree Mining Algorithm Based On Cloud Computing
9	Parallel Algorithms Research Based On Hadoop And Hama
10	Research And Implementation Of Data Classification Algorithm Based On Decision Tree