The Research Of Clustering And Ensemble Clustering Based On Cluster-Mode

Posted on:2012-03-29

Degree:Master

Type:Thesis

Country:China

Candidate:J W Geng

Full Text:PDF

GTID:2218330338470876

Subject:Computer software and theory

Abstract/Summary:

With the development and widespread application of computer technology, large amount of data information have been preserved in kinds of field. As the dataset is so many and large, we couldn't process them by artificially. Data mining technology was proposed in this situation. Clustering is one of the hottest topics of data mining. There are two issues in Clustering, one is how to compute the similarity of two data objects and another is how to partition all the data objects in the dataset so that the data objects which have strong similarity were designated into the same cluster, which have weak similarity were designated into different clusters. Above partition results in the data objects in the same cluster have strong similarity while the data objects in different clusters have weak similarity. Every single cluster has some its own limitation and applicable dataset structure. Some clusters algorithm perform well on small dataset, but poorly on large dataset; some clusters algorithm are apt to recognize symmetrical and hypersphere clusters; some clusters are appropriate to compact dataset; some clusters are sensitive to outliers. With kinds of limitation of single clustering algorithm, Clustering ensemble can solve the problems mentioned above. Clustering ensemble could improve the system's generalization capability and stability; what's more, clustering ensemble also could improve the system's accuracy.Clustering algorithms and Clustering Ensemble algorithms are researched in this paper. The classification of clustering algorithms and classic clustering algorithms are introduced. Main steps in clustering analysis, common data types and similarity measurements of different kinds of data objects are illustrated. With indepth study of Hierarchical Clustering, an improved Hierarchical Clustering algorithm named REPBFC(REpresentative-Points Based Fast Clustering) is proposed. REPBFC is a Hierarchical Agglomerative Clustering, which use some representative points to represent cluster so that could recognize irregular and non- hypersphere clusters; basing on 90₁0 rule, complete the whole clustering in two stages. REPBFC can reduce time complexity compared with traditional hierarchical clustering. In this paper, we introduce that the hot research in clustering ensemble, how to generate discrepant clustering collectivity, the measurement of diversity of clustering collectivity based on mutual information and how to design consensus function. This paper proposes that the concept of Cluster-Mode which is made up from original clustering results, two clustering ensemble algorithm named ECBCMP (Ensemble Clustering algorithm Based on Cluster-Mode and Partitioning methods) and ECCCM (Ensemble Clustering with Combining Cluster-Mode) which both based on Cluster-Mode. The two algorithms are programmed in C++ language. Testing the two algorithms on Iris, Wine and artificial datasets, show that the two algorithms are effective.

Keywords/Search Tags:

clustering analysis, hierarchical clustering, Cluster-Mode, ensemble clustering

Related items

1	Research On The Effectiveness Element Theory And Method Of Clustering Ensemble
2	Study Of Cluster Ensemble Methods Based On Hierarchical Clustering
3	Research On Key Technologies Of Co-Cluster And Co-Clustering Ensemble
4	Clustering Ensemble Algorithm Based On Mixed Data Representation
5	Study On H-K Clustering Algorithms Based On Ensemble Learning
6	Ksummary Analysis Method Based On Adaptive Multiple Clustering
7	Research On Density Cluster Centers Constrained Hierarchical Clustering
8	Research On Hybrid Algorithm Based On Subtractive Clustering
9	Research On Ensemble Clustering Algorithm Based On Bilateral Clustering
10	Research On Weighted Cluster Ensemble Algorithm Based On Validity Evaluation