Font Size: a A A

Study On Parallel Algorithm Of K-Medoids Based On MapReduce

Posted on:2016-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:W DaiFull Text:PDF
GTID:2348330482481450Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In big data era, the explosive growth of information, accurate data from huge amounts of data mining is of important significance to today's information society. As a clustering algorithm based on dividing K Medoids, its larger time complexity and the center of the traditional random selection strategy has been unable to meet the requirements for under the huge amounts of data clustering. Using MapReduce parallel computing model to improve the algorithm, while can improve the operation efficiency of the algorithm, but can't solve clustering accuracy under the large amount of data, and the convergence problem, so you must set out from the algorithm itself to solve these problems. In view of the traditional K-Medoids algorithm is sensitive to the initial clustering center, slow convergence, and the huge amounts of data environment faced by a single computer performance bottlenecks, replace and initial cluster heart options from the center, and use the MapReduce parallel programming model combined with distributed random sampling strategy, and implements a new efficient K-Medoids algorithm, and use the Hadoop storage and computing characteristics, quadratic optimization algorithm, With the traditional K-Medoids algorithm and K-Means algorithm, the improved K-Medoids algorithm in a cluster environment not only has a good speedup, on the clustering accuracy and convergence have improved to some extent.
Keywords/Search Tags:K-Medoids, distributed computation, Hadoop, parallel sampling
PDF Full Text Request
Related items