Font Size: a A A

Adaptive Semi-supervised Clustering Ensemble For High Dimensional Data

Posted on:2020-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q KuangFull Text:PDF
GTID:2428330590960637Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,clustering analysis related to massive amounts of high dimensional data is under lively debate among academia.Traditional clustering analysis methods cannot effectively deal with high-dimensional data clustering,and thus researchers propose a semi-supervised clustering ensemble method.Semi-supervised clustering ensemble integrates clustering analysis with semi-supervised learning and ensemble learning,which can significantly improve the accuracy,stability and robustness of clustering result when processing high dimensional data.However,current semi-supervised clustering ensemble methods are subject to several shortcomings,such as 1)no effective method is designed to deal with high dimensional data problems,2)prior knowledge is not fully utilized,especially pairwise constraints information,3)in the process of clustering ensemble generation,no adaptive process is used to optimize the generation process,and 4)in the course of consensus function,the results of all clustering members are considered,even though the quality of some cluster members is poor.To solve the shortcomings mentioned above,a double adaptive semi-supervised clustering ensemble method(DASSCE)is proposed in this paper.The main contributions of DASSCE are as follows: 1)A subspace generation method based on bagging constraints is put forward that uses bagging constraints to generate a set of constraint subsets and different constraint subsets to guide subspace generation.2)An adaptive clustering ensemble selection with constraints method is designed,which can effectively remove the redundancy and undesirable partitioning results in the clustering performance.3)Adaptive subspace set optimization process to obtain better clustering performance is adopted.To evaluate the effectiveness of DASSCE,this paper uses a variety of high dimensional data sets with different characteristics that are derived from real-world public data,and thorough experiments are designed.The experimental results show that,because of the three innovations proposed in this paper,DASSCE has better clustering performance than other semi-supervised clustering methods in terms of high dimensional data clustering problem.
Keywords/Search Tags:Data mining, Clustering ensemble, Semi-supervised learning, Evolutionary computation
PDF Full Text Request
Related items