Font Size: a A A

Semi Supervised Clustering Algorithm And Its Application And Research

Posted on:2009-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:X Q JiangFull Text:PDF
GTID:2178360272957415Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering technology is very important. Based on one metric (similarity metric, dissimilarity metric or distance), so called clustering is to divide set of individuals into some subset so that it is more similar between individuals in the same subset than in different subsets according to the certain criteria, the purpose of which is to mine the information from dataset. Semi Supervised clustering algorithm learns how to use a small amount of information to improve the clustering performance, which is widely used.The thesis firstly introduces general development of clustering and some technologies of the clustering. Specially, some introduce about metric learning, clustering method used common and value critic and so on, laying basic theoretical and experimental supports for the research in the following chapters. Against the previous Semi Supervised Fuzzy C-means Clustering algorithm, this paper carries out a detailed introduction and uses experiments to prove the algorithm.Secondly, in order to verify if this kind of Semi Supervised learning method can be used for other clustering algorithm, this paper improves the Maximum Entropy Clustering algorithm, uses Semi Supervised learning into the Maximum Entropy Clustering, generates Semi Supervised Maximum Entropy Clustering algorithm, and through experiments to prove by Semi Supervised learning Maximum Entropy Clustering algorithm can get the improvement and have real better result.For heaps-like, or data sets of large discrepancy of every class specimen number, to FCM algorithm and Semi Supervised Fuzzy C-means algorithm, their optimal solution may not be the right partition of the data, because these two algorithms have limitation of equal demarcation trend for data set. To resolve this problem, This thesis lastly use that distributing density size of the data dot is regard as weighted value, together with Semi Supervised learning introduced before, a Semi Supervised and dot density weighted Fuzzy C-means algorithm is proposed, and through experiments shows that the algorithm can improve the accuracy of the clustering.
Keywords/Search Tags:Data Mining, Clustering Analysis, Fuzzy C-means Clustering, Maximum Entropy Clustering, Dot Density Weighted, Semi-supervised learning, Labeled Data, Metric Learning
PDF Full Text Request
Related items