Font Size: a A A

Parallel Clustering Algorithm’s Study And Application Based On HBASE

Posted on:2015-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z P ChenFull Text:PDF
GTID:2298330467963948Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The Big Data are diverse, dynamic, heterogeneous, mass and other characteristics. How to achieve the effective mass data storage and rapidly extracting the valid values from these data, are one of the most intractable problems that we have to face in the operation and maintenance of modern enterprises. Traditional clustering(Non-parallel) has some weaknesses, especially with the surging number of the data, for example, the clustering efficiency is not high, the clustering performance is not perfect and so on.This paper studies the various large data distributed computing technologies and related open source project framework, finally, we choose the MapReduce framework, based on HBase, relying on the open source community, as distributed computing platform. Secondly, we choose the classic, broader K-means clustering algorithm. What’s more, in order to illustrate the effectiveness of the computing platform using Hadoop MapReduce based on HBase storage, mobile positioning service is implemented between the latitude and longitude and book-clustering service is implemented on the platform among22dimensions in three ways which include major, sex and grades(undergraduate, postgraduate, doctor). And for the two applications, the results are analyzed from two aspects. Firstly, to the clustering time, the cost time in the single machine is more than that in the cluster machines. What’s more, the problem of the marginal benefit also exists between clustering time and the number of cluster machines, the more the number of machines, the less its clustering effect (time and cost), which is not all true; Secondly, the clustering accuracy, the efficiency of the cluster is not very different from the stand-alone. Finally, doing some future plans for improving the framework, such as dynamic expansion of the scale of the cluster, the extension of the framework and some security privacy protection measures.
Keywords/Search Tags:bigdata, hadoop, hbase, k-means, parallel clusteringalgorithm
PDF Full Text Request
Related items