Font Size: a A A

Study On Ensemble Clustering For High-dimensional Data

Posted on:2020-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2428330599954613Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Clustering analysis is an active topic in pattern recognition and machine learning.It plays a vital role in many different applications,such as information retrieval,marketing and the Internet.When analyzing high-dimensional datasets,the traditional clustering analysis method usually cannot get good clustering results on the ultra-high-dimensional data due to the limitation of the distance measurement method.In recent years,ensemble clustering has gained more and more attention,which combines different clustering results into a unified structure and provides a more robust,stable,and accurate final classification result.This paper focuses on the ensemble clustering problem of high-dimensional data and studies the fusion algorithm and subspace partition method on high-dimensional datasets.1.Most of the existing fusion algorithms do not consider the effectiveness of the base clusterings and treat them equally,such that their accuracy is easily affected by the low-quality base clusterings.Therefore,ensemble clustering methods with weighting strategy are proposed.However,these methods still ignore the negative impact of the base clusterings with poor performance on final ensemble result.In view of these limitations of the existing fusion algorithms,this paper proposes a new weighted ensemble clustering algorithm.In particular,this paper proposes a new method to measure the effectiveness of each base clustering through estimating the optimal matching score between each base clustering and the overall result.The estimation of the optimal matching score fully takes into account the cluster information in the base clustering,including not only the sample matching degree in the same cluster,but also the sample matching degree in different clusters.Subsequently,the weight of the base clustering with negative contribution is further weakened to obtain the final weight vector,based on which,a locally weighted co-association matrix is constructed to analyze the ensemble clustering structure.2.High-dimensional data with the characteristics of sparsity,local correlation and noise make the clustering analysis become more challenging.As clusters in highdimensional data usually exist in different subspaces,this paper proposes a new subspace ensemble clustering algorithm for high-dimensional data analysis.In particular,this paper introduces the idea of spectral feature selection,and adopts the clustering algorithm based on competitive learning to divide the features of highdimensional data into different feature clusters,so that the features in the same cluster contain the similar structure information of the given data set.Subsequently,each feature cluster is regarded as a subspace to generate a base clustering.Finally,the weighted ensemble clustering algorithm proposed above is used to obtain the ensemble result.Experimental results on different datasets show the feasibility and effectiveness of proposed methods.
Keywords/Search Tags:Ensemble clustering, Weighting model, High dimensional data, Subspace analysis
PDF Full Text Request
Related items