Font Size: a A A

Research On Clustering Ensemble And Semi-Supervised Clustering In Data Mining

Posted on:2011-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:W TanFull Text:PDF
GTID:2178360305461206Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the domain of data mining, clustering analysis is a very important method to discover natural structure of data objects. According to a kind of given measure of similarity, all the data objects are divided into several disjunctive groups such that the similarity of objects from the same group is larger than the similarity of objects from the different group. However, traditional clustering algorithms are defined as unsupervised learning methods and perform without considering any prior knowledge provided by the users or real world, and these algorithms usually tend to classify the data objects by different ways of optimization and criteria. Many new and improved clustering algorithms have been proposed, but it is still hard to find a single algorithm to explore variety of structures of data objects. In recent years, semi-supervised clustering and clustering ensemble have emerged as powerful tools to solve both the problems mentioned above.Clustering ensemble is inspired by multiple classifiers ensemble. As a novel research topic, clustering ensemble has been proved to improve the performance of traditional clustering algorithms. It integrates multiple clustering solutions generated by different algorithms, or the same algorithm with different initialization parameters and so on. The final consensus clustering with better performance will be obtained after combination. Establishing consensus functions is the key problem for clustering ensemble. In this paper, a clustering ensemble algorithm based on self-organizing feature map (SOM) is proposed. Firstly, the ordinary dataset is transformed into a new feature space matrix using different clustering solutions. Then, the overall cluster quality is computed for each clustering solution as the weight of the attribute of the new feature space matrix. Finally, the consensus clustering result is generated by SOM neural network. The experiment results show that the proposed algorithm can effectively improve the clustering performance comparing with other clustering ensemble algorithms and the base clustering algorithm before combination.Semi-supervised clustering can obtain a better result using some prior knowledge which is often represented by seeds or pairwise constraints. Compared to unsupervised clustering, semi-supervised clustering utilizes a small amount of given prior knowledge to guide the clustering process. The pairwise constraints are the most common prior knowledge, and many semi-supervised clustering algorithms are based on the type of constraints. In this paper, the Cop-Kmeans algorithm is introduced in detail and an improved Cop-Kmeans algorithm is presented for solving the problem of constraint violation of Cop-Kmeans. Aiming at the sensitivity of assignment order of data objects for many semi-supervised clustering algorithms, a method based on the certainty of objects is presented for producing a new assignment order of data objects. Besides, the pairwise constraints are incorporated into the self-organizing feature map and a semi-supervised som algorithm based on pairwise constraints is put forward in this paper. Then, the semi-supervised SOM algorithm is considered as a kind of consensus function to combine multiple semi-supervised clustering solutions. At Last, the experiment results show the validity of the proposed methods.
Keywords/Search Tags:Data Mining, Clustering Analysis, Clustering Ensemble, Self-organizing Feature Map, Semi-supervised Clustering
PDF Full Text Request
Related items