Font Size: a A A

Research Of Semi-supervised Clustering Ensemble

Posted on:2013-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:2248330395453354Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the advention of the information age in recent years, a phenomenon named "information explosion with knowledge lack" appeared because a variety of data was accumulated which is far beyond the scope of human processing. So data minning came into being and shows great vitality.Clustering is one of the key technologies in data mining, it finds out the distribution of natural structure for the data objects. By using a pre-given similarity measure, all data objects are divided into several discrete groups, and to ensure the similarity of data objects in the same cluster is bigger, while the similarity of data objects in different clusters is smaller.Clustering ensemble can improve the performance of traditional clustering algorithms effectively. It combines partitions generated from a variety of different clustering algorithms or the same clustering algorithm with different initial parameters to obtain better clustering results than a single clustering algorithm. The design of the consensus function is the most important issue of clustering ensemble, as well as the current focus of the study.Semi-supervised clustering ensemble is a new technology which combines semi-supervised clustering and clustering ensemble to enhance the clustering ensemble performance. Semi-supervised clustering ensemble use some priori knowledge, such as a seed set or pairwise constraints to get a better clustering ensemble result. Compared to the unsupervised clustering ensemble, semi-supervised clustering ensemble uses a small amount of information provided by experts or users to help guide the clustering ensemble process.In this paper, first, clustering ensemble theory is studied. Then methods to generate co-matrix from base clustering members studied. Voting is used on co-matrix to design consensus function. In addition, label unifying and voting are also used to design consensus function. What’s more the two consensus functions are used to design two clustering ensemble algorithm. The two algorithms are named "Clustering Ensemble Based on Co-matrix and Voting" and."Clustering Ensemble Based on Label Unifying and Voting".Second, semi-supervised learning based on collaborative training is described in detail, and the semi-supervised ensemble model SCE is studied. And then a semi-supervised clustering ensemble algorithm named SCET is given, which use tri-training as consensus. Firstly, the algtrithm use different base clustering result members to translate the original data set into a new feature space matrix, and then the final consensus clustering result is generated by using a small amount of semi-supervised information. At the same time, for adapting to some environment that there is no semi-supervised information given, a modified tri-training algorithm named updatedtri-training is given, then it is treated as the consensus function for the adaptive semi-supervised clustering algorithm based on collaborative training (UCET).Finally, experiments are performed to verify the validity of the given algorithms. In the experimental part, firstly, the evaluation criteria of the widely used in clustering and clustering integration algorithm is summarized. And then experiments are performed on the artifitial dataset and several UCI data sets. At last, the results of the given algorithms are compared with existing clustering and clustering ensemble algorithms. Experimental results show that, compared with the results of the base clustering algorithm and other clustering ensemble algorithms, the algorithms given in this paper can both effectively improve the quality of clustering and cost much less time.
Keywords/Search Tags:Data Mining, Clustering Ensemble, Semi-supervised Clustering Ensemble
PDF Full Text Request
Related items