The Research Of A Parallel Clusterning Algorithm

Posted on:2012-08-07

Degree:Master

Type:Thesis

Country:China

Candidate:J W Zhang

Full Text:PDF

GTID:2218330368982291

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer science and technology and the popularity of the network, the information generated and collected by people has been continually expanding in the size, scope and depth. The vast amounts of data contain rich and complex information between the composition and function, so people expect to analyze it further.The K-Means clustering algorithm that is based on the prototype technology is the most widely used clustering methods, because it is simple, fast and efficient for large-scale data set. But the K-Means clustering algorithm can only be used in the limited range as it excessively depends on the initial condition, such as the selection of initial cluster centers will have impact on the clustering results. The bisecting K-Means algorithm (BKM) is a variant of K-Means clustering algorithm. The BKM can produce partitional or hierarchical clustering algorithm by recursively applying the basic K-Means clustering algorithm and the results are not related to the initial centroid.The clustering algorithm is used to process the massive data and high-dimensional data and has high time and space complexity. When dealing with massive TB-level text data, the cluster system that consists of multiple hosts can provide powerful parallel computing power. The research of bisecting K-Means parallel algorithm based on cluster environment, which can greatly improve the efficiency, will be put into effect in reality.Considering the insufficiency of clustering speed which exists in the selecting the initial centroid of bisecting K-Means clustering algorithm, the idea that selecting the points that has the maximum distance as the cluster initial centroid is implemented in the paper. The results show that the clustering speed is better than that of BKM. An in-depth study and analysis is carried out on how to accelerate clustering speed in cluster system. According to the characteristics of BKM, the parallelism algorithm based on data parallelism and symmetric data-partition is put forward. The result from the experiment shows that the new algorithm gets ideal speedup ratio and performance.

Keywords/Search Tags:

data mining, clustering, bisecting k-means, parallelism, cluster

PDF Full Text Request

Related items

1	A Fast And Efficient Parallel Bisecting K-Means Algorithm
2	Research And Application Of Bisecting K-means Algorithm Analysis Based On Financial Customer Signature
3	Based On K-means The Chinese Text Clustering Algorithm
4	K-means Based On Binary And Svm Decision Tree Algorithm Of Data Mining Research
5	The Research Of K-means Clustering Algorithm In Data Mining
6	Study Of Chinese Text Clustering On Improved K-means Algorithm
7	Research And Application Of K-means Clustering Algorithm
8	Study And Application Of CRM Data Mining Based On Clustering Algorithms
9	Research On Advertisement Recommendation System Based On Data Mining
10	Research And Improvement Of K - Means Clustering Algorithm