Font Size: a A A

Study Of Cluster Ensemble Methods Based On Hierarchical Clustering

Posted on:2011-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2178360308454090Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering is a process that groups physical or abstract object collection into multiple classes which composed of similar objects. Currently, many clustering algorithms exist in the literature, but each individual clustering algorithm has various problems, so it is difficult to meet the needs of actual problems. Recently, people begin to apply ensemble technology to the research of clustering methods and propose a number of cluster ensemble algorithms to improve the performance of these clustering algorithms. As a rapidly developing new area, clustering ensemble has become an important research in ensemble learning.In this paper, cluster ensemble methods based on hierarchical clustering and clustering validity have been studied. The main contents are as follows:First, based on the generic pairwise cluster ensemble method, cluster ensemble methods based on hierarchical clustering (HCCE) is studied and the framework of HCCE is given. In experiment, in order to study the performance of three distance measurement corresponding to the clustering fusion method which are single linkage, complete linkage and average linkage in hierarchical clustering, data classification information, which is called Micro-precision, is used to evaluate the result of cluster ensemble method.Second, an improved cluster ensemble method is proposed and cluster validity is also studied after a stability index is introduced into GPCE method. The experiments show this validity index can select better clusters and get better partitions. For showing that HCICE method is superior to GPCE method and individual clustering method, ARI(Adjusted Rand Index) and Jaccard Index are considered to evaluate the clustering results; Finally, we also study and discussion the relationship between accuracy and ensemble size, the number of clusters, respectively.Third, based on ensemble selection, cluster ensemble methods based greedy selection strategy are proposed and a new object function called Joint Criterion is also proposed. Firstly, n-clustering solutions (partitions) are gotten which used HCICE algorithm, not to integrate; Secondly, through the selection of the greedy selection strategy based Joint Criterion, we can get K-clustering solutions (partitions); lastly, ensembing the K-clustering solutions. For showing the performance of cluster ensemble selection, it is studied in experiment and made a comparison with using all available clustering solutions (partitions). Moreover, ARI (Adjusted Rand Index) and Jaccard Index are considered to evaluate the results, respectively.
Keywords/Search Tags:Clustering, Cluster ensemble, Fusion, Clustering validity, Ensemble selection
PDF Full Text Request
Related items