Font Size: a A A

The Research Of Mst Variant-based Semi-supervised Clustering Algorithms

Posted on:2013-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:M M HuoFull Text:PDF
GTID:2248330371987125Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Semi-supervised Clustering has been becoming a new hot topic in the field of pattern recog-nition and machine learning in recent years. It is also an important branch of data mining. Semi-supervised clustering is a confluent method. It combines the advantages of supervised ana-lytical methods and unsupervised analytical methods. Semi-supervised clustering improves the unsupervised clustering using few labeled data objects. Although compared with supervised clus-tering, semi-supervised clustering algorithms need fewer labeled objects. It still needs certain amount of labeled objects to ensure the accuracy. Because it’s difficult to get label objects in real life, available labeled objects are quite few. This became a limitation of the semi-supervised clus-tering algorithms which has a strong impact on performance of the algorithms. Meanwhile, be-cause of the difficulty to detect boundary points, most existing algorithms cannot cluster the clus-ters which have multi density and arbitrary shape.This paper uses little priori knowledge to improve the quality of clustering. Three new semi-supervised algorithms are proposed. They are MST-based Semi-Supervised clustering using K-labeled objects (K-SSMST), MST-based Semi-Supervised clustering using M-labeled objects (M-SSMST) and Grid-based Semi-Supervised Clustering Algorithm using MST(GSSMST). These three algorithms can detect clusters which have multi density and arbitrary shape using an ex-tremely small amount label objects. K-SSMST algorithm could automatically find natural clusters in a dataset using Minimum Spanning Tree’s algorithm variant. It does not need any input para-meter and only requires K labeled data objects where K is the number of clusters. M-SSMST can detect new clusters when the given information is not enough. GSSMST algorithm solves the issue of high time complexity using minimum spanning tree. The algorithms have been tested on vari-ous datasets. The results demonstrate that the algorithms can get ideal clustering results.
Keywords/Search Tags:Data mining, Semi-Supervised Clustering, Label Propagation, MST, Grid
PDF Full Text Request
Related items