Font Size: a A A

Research On Clustering Algorithm Based On Tree Center Of Gravity And Cut Edge Constraints

Posted on:2021-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:H R LinFull Text:PDF
GTID:2438330626955038Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering method is one of the important techniques for the fields of pattern recognition and data mining.However,partition-based,hierarchical,density-based and minimum spanning tree based methods is not satisfactory due to the different size,flexible shape and arbitrary distribution of the clusters.The results of a large number of research show that,compared to mean and density,the representative points based cluster center method has the better performance,which is insensitive to the noise,outliers and the shape of cluster.In addition,a minimum spanning tree(MST)representation has the robustness to the geometric changes in clusters' boundaries.Therefore,MST based clustering algorithm can overcome the problems that the shape of a cluster and noise have the impact on the performance of clustering method.This paper focuses on the search of representative point,the calculation of inter-cluster distance,and merge criteria in the minimum spanning tree,as well as the construction of MST based fast clustering algorithm.Based on the current research status,this paper mainly carried out the following work:(1)We proposed an inter-cluster distance metric based on the centroid of minimum spanning tree.This method is mainly based on the following reasons.Firstly,the traditional MST based clustering algorithm does not make the best use of the geometry of MST.Next,Euclidean distance metric is sensitive to the shape of cluster.Finally,the traditional representative point selection method has high time complexity.Taking the centroid of minimum spanning tree as the representative point can make full use of its geometric shape.Furthermore,the geodesic distance can adapt to the clusters with arbitrary shape.We can merge two clusters quickly by using link-cut tree and the binary algorithm,which greatly reduces the time complexity of the calculation the representative points.(2)We proposed a pre-clustering method based on breadth-first search algorithm under restricted conditions.In order to prevent the representative point method from degrading to the distance between sample points,the pre-clustering method is proposed to solve the problem that there are too few sample points in the cluster in the early stage,which can meet the requirement of MST centroid based intercluster distance metric in the next stage and further improve the clustering performance.(3)We proposed a multi-stage hierarchical clustering algorithm based on centroid of minimum spanning tree and cut edge constraints.The overall algorithm is divided into three stages: pre-clustering,a small cluster merging process based on cut edge constraint I,and a final clustering process based on cut edge constraint II.The algorithm takes into account two factors: the inter-cluster distance and the internal connection between two adjacent clusters,which makes the algorithm more accurate in the process of merging and reduces the error of the merging process.The experimental results on the synthetic data sets and real data sets demonstrate a good performance of the proposed clustering method.
Keywords/Search Tags:clustering algorithm, minimum spanning tree, centroid of tree, intercluster distance, cut edge constraint
PDF Full Text Request
Related items