Based On The Parallel Implementation Of Multi-node Data Mining Algorithm

Posted on:2013-08-12

Degree:Master

Type:Thesis

Country:China

Candidate:H L Yu

Full Text:PDF

GTID:2248330374986377

Subject:The computer system structure.

Abstract/Summary:

PDF Full Text Request

With the development of the information age, the rapid growth of data has become a very serious problem. For the reason we must use the methods of Data-mining to deal with the vast amounts of data. Through Data-mining can discover the unknown, hidden and potentially valuable knowledge for decision of support. Therefore the knowledge can be used to solve the practical problems.The traditional Apriori algorithm is performed in a single node is performed, and it can not be well adapted the massive data processing. In order to improve the needs of processing mass data, the urgent need to implement the mining algorithm in multiple nodes. To execute the mining algorithm in the node parallel, and makes the algorithm executed in a high degree of parallelism.In this paper we do an in-depth study of Map Reduce. Through the simple test and analysis of the MapReduce framework model of Hadoop, we put forward an innovative algorithm of task scheduling (DWSA). This algorithm deals with the load balancing adaptively by monitoring the number of the task in the system dynamically and using the priority-based approach to provide service for the task. In this paper we improve the basket storage model. The Boolean Matrix is used to storage the data and the same time a new data-mining algorithm of association rule. The new rule makes good use of the vector to spread out the data-mining. Thence the new model and algorithm are carried out on the optimized platform to improve the efficiency of the data-mining.

Keywords/Search Tags:

Cloud computing, Data mining, the MapReduce, Hadoop

PDF Full Text Request

Related items

1	Data Mining Based On Hadoop Platform
2	Research On Massive Digital Image Data Mining Based On Hadoop Cloud Platform
3	Parallel Data Mining Algorithm Research In Cloud
4	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
5	Based On The Parallel Implementation Of Multi-node Data Mining Algorithm
6	Research Of Massive Data Processing And Mining In Database Marketing Based On Hadoop
7	Parallel Algorithms Research Based On Hadoop And Hama
8	The Design Of The Cloud Computing System Based On Hadoop
9	Researches About Cloud Computing And Expolit And Test Hadoop Program
10	Design And Implementation Of Visual Data Platform Based On MapReduce