The Research And Implementation Of K-Medoids Clustering Algorithm Based On Density And Hadoop

Posted on:2016-06-15

Degree:Master

Type:Thesis

Country:China

Candidate:H Zhou

Full Text:PDF

GTID:2308330470465705

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology, people are facing explosive growth of data, and traditional data mining algorithms to deal with large data can not meet the requirements on efficiency. How can people obtain effective information from large mounts of data? To solve this problem, this paper analyses a kind of classic clustering algorithm—K-Medoids, combined with the more popular Hadoop platform, proposed the K-Medoids parallel algorithms based on density.In this paper, the main work basically has the following two points1. As for the inadequacies of traditional K-Medoids clustering algorithms — volatility of clustering results with k randomly chosen initial cluster centers is large. This paper presents a K-Medoids algorithm based on density. Firstly, K-Medoids algorithm clusters the initial data on the basis of density, and then selects a larger density of class k cluster centers as the K-Medoids clustering initial cluster centers. Experimental results show that the accuracy of proposed algorithm is higher than that of traditional K-Medoids clustering algorithm.2. In order to solve the delay problem of algorithm in the process of dealing with massive data. Combined with the more popular Hadoop platform, parallelized the proposed algorithm with the Map Reduce frame. The key work of this process includes the following two points: the first is that the algorithm should be disintegrated into multiple Job. Mission of map phase and reduce phase of each Job should be ascertained. The second is that key and value should be designed in accordance with needs. Experiments proved that the greater the amount of data and the more cluster nodes, the greater the difference in processing time.The final part is the conclusion and summary on the research in this paper. The further work to be done will be confirmed with the elaboration of deficiencies in my research.

Keywords/Search Tags:

Hadoop, Density-based cluster, K-Medoids, MapReduce

PDF Full Text Request

Related items

1	Research And Optimization On K-medoids Clustering Algorithm Based On Hadoop Platform
2	Study On Parallel Algorithm Of K-Medoids Based On MapReduce
3	Design Of Mapreduce Task Scheduling Algorithms In Heterogeneous Hadoop Cluster
4	The Optimization Of High Performance MapReduce FairScheduler And The Implementation On Simulator Of Huge Scale Cluster
5	Research On Hadoop Cluster Scheduling Optimization
6	Research On Distributed SVM Algorithm Based On Hadoop Platform
7	An Optimized MapReduce Workfow Scheduling Algorithm For Heterogeneous Computing
8	Research On Data Cube Technology Based On MapReduce
9	Research And Implementation Of Expansibility Oriented Cluster Architecture
10	Two Kinds Of Improvement On K-medoids