Font Size: a A A

Research On Parallel Hierarchical Clustering Algorithm

Posted on:2008-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:L GuoFull Text:PDF
GTID:2178360215979877Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of the important areas in the data mining research. Especially, it is applicable in many fields, such as engineering, business, biology, social sciences and others. However, because of the complexity of the distribution of clustering object at high dimensional feature space, the uncertainty and the flexibility of the evaluation of clustering results and high computing complexity of cluster problem as an optimized problem, clustering algorithm still are confronted with lots of problems and challenges.At present, researchers put forward lots of clustering algorithms. Therein, hierarchical clustering algorithms as a main kind, are paid much attention. So far time complexity of the best sequential algorithms is up to O(n~2), but huge data in biological information or intrusion detection is hard for them to process; most of the parallel algorithms use the CRCW-PRAM or CREW-PRAM models of computing, the cost of these algorithms are O(n~2) or larger, most are randomized or probability algorithms, unable to automatically adapt to the number of processors, and not take account of sharing memory conflicts. This paper proposed a new parallel adaptive deterministic algorithm without memory conflicts based on EREW-SIMD shared memory model by use of complete graph and Euclidean minimum spanning tree, with cost of O(n~2) . Through performance comparison with other algorithms, it is demonstrated that on guarantee of no memory conflicts, the algorithm can run on the weakest PRAM-EREW model at an optimal cost, and automatically adapt to the change of the number of processors.To validate the performance and extensibility of our algorithm in practice, we made several experiments and simulations based on the synthetic and benchmark data sets from Internet on IBM P690 of high performance computing centre. The results showed that our algorithm has a great superiority in both computing time and space, and consequently a stronger adaptability and operability for large scale data sets.
Keywords/Search Tags:High performance computing, Parallel algorithms, Hierarchical clustering, Memory conflicts, EREW-SIMD
PDF Full Text Request
Related items