Clustering is a process that groups physical or abstract object collection into multiple classes which composed of similar objects. Currently, many clustering algorithms exist in the literature, but each individual clustering algorithm has various problems, so it is difficult to meet the needs of actual problems. Recently, people begin to apply ensemble technology to the research of clustering methods and propose a number of cluster ensemble algorithms to improve the performance of these clustering algorithms. As a rapidly developing new area, clustering ensemble has become an important research in ensemble learning.In this paper, cluster ensemble methods based on hierarchical clustering and clustering validity have been studied. The main contents are as follows:First, based on the generic pairwise cluster ensemble method, cluster ensemble methods based on hierarchical clustering (HCCE) is studied and the framework of HCCE is given. In experiment, in order to study the performance of three distance measurement corresponding to the clustering fusion method which are single linkage, complete linkage and average linkage in hierarchical clustering, data classification information, which is called Micro-precision, is used to evaluate the result of cluster ensemble method.Second, an improved cluster ensemble method is proposed and cluster validity is also studied after a stability index is introduced into GPCE method. The experiments show this validity index can select better clusters and get better partitions. For showing that HCICE method is superior to GPCE method and individual clustering method, ARI(Adjusted Rand Index) and Jaccard Index are considered to evaluate the clustering results; Finally, we also study and discussion the relationship between accuracy and ensemble size, the number of clusters, respectively.Third, based on ensemble selection, cluster ensemble methods based greedy selection strategy are proposed and a new object function called Joint Criterion is also proposed. Firstly, n-clustering solutions (partitions) are gotten which used HCICE algorithm, not to integrate; Secondly, through the selection of the greedy selection strategy based Joint Criterion, we can get K-clustering solutions (partitions); lastly, ensembing the K-clustering solutions. For showing the performance of cluster ensemble selection, it is studied in experiment and made a comparison with using all available clustering solutions (partitions). Moreover, ARI (Adjusted Rand Index) and Jaccard Index are considered to evaluate the results, respectively. |