Font Size: a A A

Feature Subspace Based Hybrid Clustering Ensemble Approach

Posted on:2019-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ChenFull Text:PDF
GTID:2428330566486601Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering ensemble focuses on integrating different clustering results or clustering models.Although clustering ensemble obtains better robustness,stability and accuracy than single clustering algorithms,it has the following limitations:(1)the clustering of each ensemble suffers from the curse of dimensionality;(2)it needs find common algorithms and parameters;(3)it needs obtain different views from the original datasets;(4)there is no effective consensus between the hard and soft partitioning adopted by base clustering;(5)the integration results are highly probable to contain redundant and interference information.This thesis proposes two effective clustering ensemble methods to solve the above limitations.The first method is Soft Subspace Clustering Ensemble based on Latent Model(SSCELM).This method describes characteristics of feature distributions by Jensen-Shannon divergence,and combines fuzzy theory to construct soft subspaces,which improves performances of clustering in low-dimensional subspaces and the diversity of results in the integration.Then,the probability attribute matrix is generated by integrating the probabilistic latent semantic analysis model to strengthen the category probability,where a potential factor analysis method is adopted to obtain the probability factor.The second method is Random Subspace Hybrid Consensus Clustering based on Adaptive Three-way Decision(RSHCC).Double stochastic method including random subspace and random parameter is adopted to ensure the diversity of results.The system of three-way decision is constructed,in which the hard and soft partitioning information is combined with the rough set theory.RSHCC adopts a hybrid clustering validity index strategy,and reduces the redundant information adaptively with the equivalence relationship in three-way decision to strengthen the stability of class boundary.Experiment results in 18 real-world datasets show that SSCELM obtains satisfactory clustering performances,which is demonstrated by significant differences in features.Besides,RSHCC outperforms the majority of state-of-the-art clustering ensemble approaches on multiple datasets.Non-parametric statistical tests are also adopted to compare multiple approaches.Based on these observations,we conclude that the proposed clustering ensemble methods can improve effectively clustering performances and provide a reliable guarantee in knowledge mining.
Keywords/Search Tags:Ensemble learning, Subspace, Fuzzy theory, Hybrid clustering, Three-way Decision
PDF Full Text Request
Related items