Font Size: a A A

Research On Clustering Algorithm Based On Minimum Spanning Tree

Posted on:2014-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2268330392972411Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining is one of the most advanced research field in database technology andinformation decision, and draws nutrition from multiple disciplines. Because clusteringanalysis was an important research direction in data mining, so the further study forclustering analysis had important value in theory and practical application. Thedevelopment of clustering analysis methods had several decades, and researchers trieddifferent approaches to solve clustering problems during this period. Among manymethods proposed, minimum spanning tree clustering algorithm was a typical method.But this kind of existing methods still existed deficiencies, so there was a need toimprove existing clustering analysis technology by proposing a new clustering theoryand method.Firstly, this paper proposed a new minimum spanning tree algorithm calledQMST(Quick Minimum Spanning Tree) on the basis of Prim and Kruskal. QMSTdivided data set into three parts including upper、lower and temporary data set, andreduced the data scale required to be processed greatly, so it could generate a minimumspanning tree quickly, and the required time was better than Prim and Kruskal. When itdealt with high dimensional data set, it introduced a dimensionality reduction strategyaccording to attribute difference principle, so it could still maintain rapid characteristicsin process of generating minimum spanning tree for high dimensional data set. Secondly,this paper proposed a quick self-adaptive clustering analysis method based on minimumspanning tree QSAK-MST(Quick Self-adaptive K-means–Minimum Spanning Tree)by analyzing existing minimum spanning tree clustering algorithms, this algorithmcombined with classification and density clustering thought. This method couldgenerate minimum spanning tree of data set quickly by QMST, it didn’t depend onparameter selection, but compared compactness degree of clusters and divided spanningtrees iteratively by using splitCriterion, then generated the initial clustering centers andthe initial clusters. Then it used the core idea of K-means clustering analysis methodand the initial clustering centers、the initial clusters from the before stage, took theminimal square error function. Each object was assigned to the most similar cluster bymean value of objects in clusters. It updated the mean value of clusters, and repeatedthis process until the function converged or it reached the maximum iteration number.This local adjustment process could effectively alleviate deviation when it divided minimum spanning tree and reduced dimensionality of high dimensional data set, inorder to improve the accuracy of clustering results. QSAK-MST solved the problems ofsetting parameters beforehand and spending too much time on generating minimumspanning tree, it increased clustering speed and improved clustering accuracy. At thesame time, this method took a dimension reduction strategy, so it was still effective indealing with high dimensional data set, and it controlled the clustering error caused bydimension reduction, maintained clustering accuracy within an acceptable range. At last,QSAK-MST and two typical methods based on minimum spanning tree are analyzedand compared by using several real data sets, the experimental results show that thismethod is better in clustering speed、clustering stability and clustering accuracy andverify effectiveness of QSAK-MST.
Keywords/Search Tags:Minimum Spanning Tree, Initial Clustering Centers, Compactness Degree, Split Criterion, Dimension Reduction
PDF Full Text Request
Related items