Font Size: a A A

The Research Of A Parallel Clusterning Algorithm

Posted on:2012-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:J W ZhangFull Text:PDF
GTID:2218330368982291Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer science and technology and the popularity of the network, the information generated and collected by people has been continually expanding in the size, scope and depth. The vast amounts of data contain rich and complex information between the composition and function, so people expect to analyze it further.The K-Means clustering algorithm that is based on the prototype technology is the most widely used clustering methods, because it is simple, fast and efficient for large-scale data set. But the K-Means clustering algorithm can only be used in the limited range as it excessively depends on the initial condition, such as the selection of initial cluster centers will have impact on the clustering results. The bisecting K-Means algorithm (BKM) is a variant of K-Means clustering algorithm. The BKM can produce partitional or hierarchical clustering algorithm by recursively applying the basic K-Means clustering algorithm and the results are not related to the initial centroid.The clustering algorithm is used to process the massive data and high-dimensional data and has high time and space complexity. When dealing with massive TB-level text data, the cluster system that consists of multiple hosts can provide powerful parallel computing power. The research of bisecting K-Means parallel algorithm based on cluster environment, which can greatly improve the efficiency, will be put into effect in reality.Considering the insufficiency of clustering speed which exists in the selecting the initial centroid of bisecting K-Means clustering algorithm, the idea that selecting the points that has the maximum distance as the cluster initial centroid is implemented in the paper. The results show that the clustering speed is better than that of BKM. An in-depth study and analysis is carried out on how to accelerate clustering speed in cluster system. According to the characteristics of BKM, the parallelism algorithm based on data parallelism and symmetric data-partition is put forward. The result from the experiment shows that the new algorithm gets ideal speedup ratio and performance.
Keywords/Search Tags:data mining, clustering, bisecting k-means, parallelism, cluster
PDF Full Text Request
Related items