Font Size: a A A

The Research Of Clustering And Ensemble Clustering Based On Cluster-Mode

Posted on:2012-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:J W GengFull Text:PDF
GTID:2218330338470876Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development and widespread application of computer technology, large amount of data information have been preserved in kinds of field. As the dataset is so many and large, we couldn't process them by artificially. Data mining technology was proposed in this situation. Clustering is one of the hottest topics of data mining. There are two issues in Clustering, one is how to compute the similarity of two data objects and another is how to partition all the data objects in the dataset so that the data objects which have strong similarity were designated into the same cluster, which have weak similarity were designated into different clusters. Above partition results in the data objects in the same cluster have strong similarity while the data objects in different clusters have weak similarity. Every single cluster has some its own limitation and applicable dataset structure. Some clusters algorithm perform well on small dataset, but poorly on large dataset; some clusters algorithm are apt to recognize symmetrical and hypersphere clusters; some clusters are appropriate to compact dataset; some clusters are sensitive to outliers. With kinds of limitation of single clustering algorithm, Clustering ensemble can solve the problems mentioned above. Clustering ensemble could improve the system's generalization capability and stability; what's more, clustering ensemble also could improve the system's accuracy.Clustering algorithms and Clustering Ensemble algorithms are researched in this paper. The classification of clustering algorithms and classic clustering algorithms are introduced. Main steps in clustering analysis, common data types and similarity measurements of different kinds of data objects are illustrated. With indepth study of Hierarchical Clustering, an improved Hierarchical Clustering algorithm named REPBFC(REpresentative-Points Based Fast Clustering) is proposed. REPBFC is a Hierarchical Agglomerative Clustering, which use some representative points to represent cluster so that could recognize irregular and non- hypersphere clusters; basing on 9010 rule, complete the whole clustering in two stages. REPBFC can reduce time complexity compared with traditional hierarchical clustering. In this paper, we introduce that the hot research in clustering ensemble, how to generate discrepant clustering collectivity, the measurement of diversity of clustering collectivity based on mutual information and how to design consensus function. This paper proposes that the concept of Cluster-Mode which is made up from original clustering results, two clustering ensemble algorithm named ECBCMP (Ensemble Clustering algorithm Based on Cluster-Mode and Partitioning methods) and ECCCM (Ensemble Clustering with Combining Cluster-Mode) which both based on Cluster-Mode. The two algorithms are programmed in C++ language. Testing the two algorithms on Iris, Wine and artificial datasets, show that the two algorithms are effective.
Keywords/Search Tags:clustering analysis, hierarchical clustering, Cluster-Mode, ensemble clustering
PDF Full Text Request
Related items