Parallel Clustering Algorithmâ€™s Study And Application Based On HBASE

Posted on:2015-07-06

Degree:Master

Type:Thesis

Country:China

Candidate:Z P Chen

Full Text:PDF

GTID:2298330467963948

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The Big Data are diverse, dynamic, heterogeneous, mass and other characteristics. How to achieve the effective mass data storage and rapidly extracting the valid values from these data, are one of the most intractable problems that we have to face in the operation and maintenance of modern enterprises. Traditional clustering(Non-parallel) has some weaknesses, especially with the surging number of the data, for example, the clustering efficiency is not high, the clustering performance is not perfect and so on.This paper studies the various large data distributed computing technologies and related open source project framework, finally, we choose the MapReduce framework, based on HBase, relying on the open source community, as distributed computing platform. Secondly, we choose the classic, broader K-means clustering algorithm. Whatâ€™s more, in order to illustrate the effectiveness of the computing platform using Hadoop MapReduce based on HBase storage, mobile positioning service is implemented between the latitude and longitude and book-clustering service is implemented on the platform among22dimensions in three ways which include major, sex and grades(undergraduate, postgraduate, doctor). And for the two applications, the results are analyzed from two aspects. Firstly, to the clustering time, the cost time in the single machine is more than that in the cluster machines. Whatâ€™s more, the problem of the marginal benefit also exists between clustering time and the number of cluster machines, the more the number of machines, the less its clustering effect (time and cost), which is not all true; Secondly, the clustering accuracy, the efficiency of the cluster is not very different from the stand-alone. Finally, doing some future plans for improving the framework, such as dynamic expansion of the scale of the cluster, the extension of the framework and some security privacy protection measures.

Keywords/Search Tags:

bigdata, hadoop, hbase, k-means, parallel clusteringalgorithm

PDF Full Text Request

Related items

1	Design And Implementation Of Mobile Location Data Application System Based On Hadoop
2	Clean Bill System On Bigdata Technology
3	Parallel Clustering Algorithm Based On MapReduce
4	Research And Construction On Performance Management System Based On Hadoop
5	Research On Parallel Acceleration Algorithm Of Association Rules Based On Hadoop
6	Research On Spatial Data Mining Based On Hadoop
7	Hadoop-based Parallel Algorithm For Mining
8	Design And Implementation Of HBase Hierarchical Auxiliary Index System
9	Study On The Data Driven Parallel Incremental SVM Learning Algorithm Based On Hadoop Framework
10	Research On Parallel Data Mining Based On Hadoop