Font Size: a A A

Probabilistic Model Based Cluster Ensemble Algorithm

Posted on:2015-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:R DongFull Text:PDF
GTID:2308330464955517Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering analysis is an important technology to explore the underlying structure of the data and have been applied to many fields of research which involves analyzing or processing multivariate data. Clustering algorithms always partition the data into a finite set of groups according to certain optimization criteria, thus similar data objects would be partitioned into the same group while dissimilar objects would be partitioned into different groups. However, since most existing clustering algorithms have implicit or explicit assumptions about the underlying structure of the data set, there is no single cluster algorithm applicable to all data sets. Cluster ensemble algorithms, which combine a set of clustering results into a consensus one, was then proposed to provide a robust and high quality clustering result.Currently, the research of cluster ensemble methods mainly focused on cluster ensemble generation, cluster ensemble selection and consensus function design. In this paper, we analyzed and compared the characteristics of existing cluster ensemble algorithms. Besides, we proposed a novel cluster ensemble algorithm from probabilistic perspective. We assume that all the observed clustering results are generated by a latent cluster model, under the control of two probabilistic parameters. An EM-style method is then used to seek the latent cluster model with the maximum likelihood. Experimental results have shown that our algorithm outperforms some state-of-the-art cluster ensemble algorithms such as CSPA. MCLA, HGPA and EAC-AL.
Keywords/Search Tags:Clustering analysis, Cluster ensembles, Data mining, Machine learning
PDF Full Text Request
Related items