Font Size: a A A

Fusion Based On Fuzzy Matrix Clustering

Posted on:2009-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhuFull Text:PDF
GTID:2208360245979057Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Cluster analysis is an important method in exploratory data analysis especially in the field such as data mining and knowledge discovery, and is being applied in a variety of engineering and scientific disciplines such as biology, psychology, medicine, marketing. Clustering analysis organizes data by abstracting underlying structure either as a grouping of individuals or as a hierarchy of groups.The main task of this paper is to delve into the clustering ensemble algorithm in details not only staying on the theory but also doing many matlab experiments, and mostly importantly come up with a very effective and novel method which is based on the clustering ensemble.clustering ensemble is very powerful tool, greatly improving the stability and robustness of the unsupervised classification. Clustering ensemble is aim at ameliorating single clustering algorithm defect, because all single clustering algorithm assumes that the data are in different distribution and the outcome of algorithm is sensitive to the different input arguments and different initialization. So the basic idea of clustering ensemble is to run clustering algorithm many time which can be some algorithm with different parameter, different initialization or different data sampling, as well as including different clustering algorithm, and then the clusters result has been produced, which for normal situation, is the data structure with cluster labels. The task of clustering ensemble combines the data structure of multiple partitions into the final clustering label, which usually is considered as consensus function. The biggest distinction between the clustering ensemble and traditional clustering algorithm lies in that the target of traditional algorithm is the data set which should be considered what is the distribution and underlying structure. Whereas the clustering ensemble concerns with the outcome of the traditional clustering, in stead of focusing on the distribution of the data set, which should be the task of traditional clustering, the clustering ensemble only cares about how to maximize the shared information generating by the original one. Clustering ensemble can be seen as the clustering for the cluster partition. However, finding a proper consensus function is hard problem in the clustering ensemble. In the current research, the consensus function includes the similarity matrix based method, hypergraph-based method, mutual information, and statistical-based method.All of such methods is using the clustering partition labels as the input of consensus function, however, the label vector is the outcome of hard partition clustering algorithm, this paper bases on the principle of statistics and probability theory, taking the fuzzy clustering algorithm as generative approach and fuzzy matrix as the input of consensus function. Through multiple running of fuzzy or soft partition clustering algorithm, the algorithm then assumes that each data point is independent with different class belonging value, and derives the formula of prior probability of data points. We build the finite mixture model and use EM algorithm to estimate the parameter of the data's expectation of belonging to each model.The algorithm of this paper is superior on many data sets. And this paper does much work on experiments on comparing with different algorithm and data sets. And the experiment takes the UCI machine earning data sets. And the result shows that such method is better than the other ensemble algorithm on the stability and higher average clustering performance.
Keywords/Search Tags:clustering, clustering ensemble, finite mixture model, EM algorithm, fuzzy clustering
PDF Full Text Request
Related items