Font Size: a A A

Study On Clustering Ensemble Selection Algorithm

Posted on:2014-04-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:L M LiuFull Text:PDF
GTID:1268330401479290Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering ensemble solves the problem of cluster analysis by generating a large set of clustering partitions based on the clustering algorithms and combining them using a consensus function to get the final result, and it has been the research focus of data mining. However, traditional clustering ensemble algorithms combine all of the available clustering partitions together to get the final clustering result, yet in supervised classification area, it has been known that selective classifier ensembles can always achieve better solutions when compared with traditional ensemble methods. Till the last years there has been a lot of activity in unsupervised clustering ensemble selection research. The study has showed that clustering ensemble selection can significantly improve the performance of cluster analysis. The key technologies of clustering ensemble selection are investigated in this dissertation, including dimension reduction、selection strategy and consensus function, etc. Lastly, the dissertation applies clustering ensemble selection algorithm for multiple clustering analysis.The dissertation, firstly, studies the algorithms of dimension reduction, especially principal component analysis (PCA), but traditional PCA algorithm is generally based on the rank of the matrix, and the rank calculation is not convex, discrete issues and complex. In order to solve this problem, the model based on a robust formulation using L1norm together with trace norm is proposed. The dissertation also derives an efficient ALM algorithm for the nonlinear optimizations. Both mathematical analysis and visual results show the efficiency and the good performance of the proposed method.The selection strategy of clustering ensemble selection algorithm has been studied and the dissertation theoretically proves that clustering ensemble selection is superior to clustering ensemble. In order to solve the selection of reference partitions, the dissertation proposes using clustering validity evaluation to evaluate all available clustering ensemble partitions and selecting the best quality as reference partition, based on this result, the dissertation proposes setting weights to ensemble members according to the significance of attribute in tolerance relation theory and lastly the framework of the selective weight-clustering ensemble algorithm has been put out. The dissertation also analyses the parameters of the number of clusters and the balance factor and proposes the optimized algorithm of the cluster number based on consensus criteria function.To solve the consensus function problem of clustering ensemble, a novel algorithm based on nonnegative matrix factorization (NMF) is proposed and the dissertation introduces the spectral clustering to group and select the clustering partitions, and then proposes the clustering ensemble selection algorithm based on NMF. For the binary data, the clustering ensemble algorithm based on binary nonnegative matrix factorization (BMF) is proposed. The experiments show that the new algorithm is effective and clustering performance could be significantly improved.To deal with multiple clustering analysis, the thesis proposes the new multiple clustering algorithm base on clustering ensemble selection. The algorithm gets the similar matrices by clustering ensemble selection, and then constructs the hierarchy tree base on the matrices, cuts the tree using the modularity algorithm, and finally the multiple clustering results are obtained. Extensive experimental results demonstrate the diversity of clustering is great difference and the quality of clustering is high.Finally, the innovations of this thesis have summarized, and the future research subjects are also presented.
Keywords/Search Tags:clustering ensemble selection, dimension reduction, selection strategy, consensus function, nonnegative matrixfactorization, multiple clustering
PDF Full Text Request
Related items