Font Size: a A A

The Research And Implementation Of K-Medoids Clustering Algorithm Based On Density And Hadoop

Posted on:2016-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhouFull Text:PDF
GTID:2308330470465705Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, people are facing explosive growth of data, and traditional data mining algorithms to deal with large data can not meet the requirements on efficiency. How can people obtain effective information from large mounts of data? To solve this problem, this paper analyses a kind of classic clustering algorithm—K-Medoids, combined with the more popular Hadoop platform, proposed the K-Medoids parallel algorithms based on density.In this paper, the main work basically has the following two points1. As for the inadequacies of traditional K-Medoids clustering algorithms — volatility of clustering results with k randomly chosen initial cluster centers is large. This paper presents a K-Medoids algorithm based on density. Firstly, K-Medoids algorithm clusters the initial data on the basis of density, and then selects a larger density of class k cluster centers as the K-Medoids clustering initial cluster centers. Experimental results show that the accuracy of proposed algorithm is higher than that of traditional K-Medoids clustering algorithm.2. In order to solve the delay problem of algorithm in the process of dealing with massive data. Combined with the more popular Hadoop platform, parallelized the proposed algorithm with the Map Reduce frame. The key work of this process includes the following two points: the first is that the algorithm should be disintegrated into multiple Job. Mission of map phase and reduce phase of each Job should be ascertained. The second is that key and value should be designed in accordance with needs. Experiments proved that the greater the amount of data and the more cluster nodes, the greater the difference in processing time.The final part is the conclusion and summary on the research in this paper. The further work to be done will be confirmed with the elaboration of deficiencies in my research.
Keywords/Search Tags:Hadoop, Density-based cluster, K-Medoids, MapReduce
PDF Full Text Request
Related items