Font Size: a A A

Based On The Parallel Implementation Of Multi-node Data Mining Algorithm

Posted on:2013-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:H L YuFull Text:PDF
GTID:2248330374986377Subject:The computer system structure.
Abstract/Summary:PDF Full Text Request
With the development of the information age, the rapid growth of data has become a very serious problem. For the reason we must use the methods of Data-mining to deal with the vast amounts of data. Through Data-mining can discover the unknown, hidden and potentially valuable knowledge for decision of support. Therefore the knowledge can be used to solve the practical problems.The traditional Apriori algorithm is performed in a single node is performed, and it can not be well adapted the massive data processing. In order to improve the needs of processing mass data, the urgent need to implement the mining algorithm in multiple nodes. To execute the mining algorithm in the node parallel, and makes the algorithm executed in a high degree of parallelism.In this paper we do an in-depth study of Map Reduce. Through the simple test and analysis of the MapReduce framework model of Hadoop, we put forward an innovative algorithm of task scheduling (DWSA). This algorithm deals with the load balancing adaptively by monitoring the number of the task in the system dynamically and using the priority-based approach to provide service for the task. In this paper we improve the basket storage model. The Boolean Matrix is used to storage the data and the same time a new data-mining algorithm of association rule. The new rule makes good use of the vector to spread out the data-mining. Thence the new model and algorithm are carried out on the optimized platform to improve the efficiency of the data-mining.
Keywords/Search Tags:Cloud computing, Data mining, the MapReduce, Hadoop
PDF Full Text Request
Related items