Font Size: a A A

Research And Implementation Of Hierarchical Clustering Alogithm Based On MPI

Posted on:2013-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:X X LiFull Text:PDF
GTID:2248330395486736Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cluster analysis which belong to the field of data mining is an important researchdirection, the cluster analysis is actually a classification of the data set, a packet. Clusteranalysis is widely used in many areas of machine learning, biology, statistics, market,marketing, etc. play an important role. The clustering algorithm is to play a decisive rolein the cluster analysis, hierarchical clustering algorithm is one of the main algorithm.Hierarchical clustering algorithm is simple and faster,is a pillar of cluster analysis.Hierarchical clustering algorithm need to calculate the distance between all classes, andmerge classes, but we must recalculate the distance between the class, the highcomplexity of this calculation time; With the continuously increasing data size andimprove the efficiency of clustering is also an important research question.Problems of the hierarchical clustering algorithm based on the above analysis, thisarticle on the traditional hierarchical clustering algorithm has been improved by thedistance between the class according to a certain order to sort, in order to resolve themerge class even after re-calculating the distanceon this basis, this article also combinesthe Kruskal minimum spanning tree algorithm of hierarchical clustering algorithm tofurther improve the overall reduce the algorithm complexity, while improving thealgorithm scalability.In order to further improve the efficiency of the implementation of the algorithm,the paper study and achieve the parallel hierarchical clustering algorithm. Select LANenvironment, parallel virtual machine and LINUX, to jointly build a cluster system as aparallel computing platform. Model of parallel programs using MPI message passinginterface. The paper evaluate the parallel algorithms from the theoretical andexperimental results. The experimental results show that: the clustering results of thehierarchical clustering algorithm based on MPI and serial algorithm is the same, but theefficiency has been greatly improved.
Keywords/Search Tags:cluster analysis, hierarchical clustering, sort, parallel algorithms, MPI
PDF Full Text Request
Related items