Font Size: a A A

Semi-supervised Clustering Algorithm Based On Single Linkage Clustering

Posted on:2018-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z J DingFull Text:PDF
GTID:2347330533957192Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Semi-supervised learning method has a pivotal position in data mining,this method can effectively and purposefully mining and analyze data based on a small amount of effective labeled data information.But semi-supervised learning method has been widely and more maturely applied in semi-supervised regression method and semi supervised classification method at present.Comparatively speaking,the research of semi-supervised clustering method are not mature and wide enough.K-meansGuider method is proposed by Li Shan in 2010.It is a semi-supervised clustering algorithm based on classification which combining with the thought of Kmeans algorithm and semi-supervised clustering algorithm based on classification.Its main idea is using semi-supervised learning method to improve process that select the center of cluster based on the thought of searching center of cluster of Kmeans method.This method makes a rough classifier to classifies original data set,extends the method of selecting cluster center on the basis of traditional k-means clustering,then clusters data set generally with k-meansGuider method.The cluster results are integrated,but its results is highly dependent on the rough classifier,and time efficiency is not good.Based on K-meansGuider method,combined with the thought of single linkage in the hierarchical clustering method and Clustering by fast search and find of density peaks proposed by Rodriguez and Laio,this paper proposes a semi-supervised clustering algorithm based on single linkage clustering.It uses a small amount of effective labeled data information to divide this part of original data set into initial clusters with class label,then according to Single Linkage thoughts to clustering the rest of unsupervised data into the initial class,which set up the threshold to avoid errors.Based on 5 groups of real data of UCI database to implement,and compare with K–means method and K-meansGuider method,the experimental results show that this improved method for improving the clustering effect has more obvious effect.
Keywords/Search Tags:Semi-supervised learning, Clustering, Single Linkage, K-means
PDF Full Text Request
Related items