Font Size: a A A

Research Of Clustering Algorithms Based On Minimum Spanning Tree

Posted on:2013-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:S Q DongFull Text:PDF
GTID:2248330392954617Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, the data stream as an important data form is paid more and moreattention and is applied in all works of life. For example, the call records of thetelecommunications department, the data stream collected in the weather and the networksecurity and so on. At present, most of the data stream clustering algorithm uses theframework with micro-cluster (online) and macro-cluster (offline). The present clusteringalgorithms based on minimum spanning tree can’t finish the desired clusters efficient andintelligent. In this paper, clusters are generated using the data structure of minimumspanning tree with the idea of micro-clustering. In addition, the other algorithm automaticpartitions a point set into a group of clusters using the minimum spanning tree bymaximizing the overall standard deviation reduction.Firstly, because the minimum spanning tree (MST) clustering algorithm deals withthe large numbers of data in low efficiency, this paper proposes an improved MSTclustering algorithm based on Micro-clusters (Micro-MST-Cluster). The proposedalgorithm has two parts, the micro-clustering based on the density and the division basedon MST. Firstly, with the micro-clustering based on density, the initial clustering is made.And noisy data is detected and eliminated in the process of micro-clustering if the datasetcontains the noisy data. Secondly, the MST with weights is constructed bymicro-clustering, and the next production of k clustering is formed by removing the k-1longest edges of MST, and then according to an objective function, clustering results areoptimized to form the eventual clustering results.Secondly, the self-clustering algorithm based on minimum spanning tree is proposed(SDR-MST-Cluster). The algorithm first constructs the minimum spanning tree based onEuclidean distance using the given data stream and calculates the average standarddeviation of the EMST. Next, the average standard deviation of every sub-tree iscalculated by removing the edge which overall standard deviation reduction is maximum.Iterate this process until the standard deviation belongs to a certain threshold. Additional,the algorithm also uses a fading function to eliminate the influence of the data stream which arrived at different time.Lastly, the algorithms in this article are implemented in Java and they are analyzedand validated.
Keywords/Search Tags:Data stream, Clustering, Minimum spanning tree, Self-clustering, Standarddeviation
PDF Full Text Request
Related items