Font Size: a A A

Research On Cluster Ensemble Approaches With Semi-supervised Information And Large Scale Dataset

Posted on:2019-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z J LinFull Text:PDF
GTID:2428330545997406Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cluster analysis,which groups samples into different clusters by similarities derived from the samples' features without any class labels,is one of the most significant unsupervised learning approaches.Currently,cluster analysis has already been applied to many real-world business scenarios,or used as a pre-processing procedure for other supervised learning algorithms.However,since its strong dependence on specific distance metric and ill-posed definitions,state-of-the-art clustering algorithms are not always supposed to achieve ideal results.To handle with such a dilemma,researchers have proposed cluster ensemble and semi-supervised clustering methods,in order to improve the stability and robustness of clustering results by involving the help of ensemble learning and external semi-supervised information like pairwise constraints,respectively.Hence,semi-supervised cluster ensemble is proposed as a combination of two approaches that aims to benefit from both techniques above and expected to obtain an even better performance.The thesis firstly focuses on how to utilize pairwise constraints in semi-supervised cluster ensemble and then figured out some possible problems of generating ensemble members with pairwise constraints.Accordingly,we proposed two semi-supervised cluster ensemble approaches that use pairwise constraints in the consensus clustering.One propagates pairwise constraints on the graph of samples built from co-association matrix to get a better consensus result based on the idea of label propagation,and the other is a heuristic weighting method that quantify the quality of clusters by.the satisfaction of pairwise constraints and internal quality measurements.Additionally,the thesis also studies the problem that current cluster ensemble approaches with co-association matrix may not be scalable to large-scale datasets,and propose two scalable cluster ensemble methods using representative points and low rank matrix approximation based on two state-of-the-art researches on large scale spectral clustering.Experiment results demonstrate that proposed propagation-based and weighting-based semi-supervised cluster ensemble approaches could effectively make use of pairwise constraints during the procedure of consensus clustering and achieve better performance and time efficiency than generating ensemble members using pairwise constraints.Meanwhile,the results also show that proposed large-scale cluster ensemble approaches could be applied to large-scale datasets and obtain a reasonable output.
Keywords/Search Tags:Cluster ensemble, Semi-supervised clustering, Pairwise constraints
PDF Full Text Request
Related items