Font Size: a A A

The Study Of Grid-based Clustering Ensemble Algorithm

Posted on:2012-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q L CaoFull Text:PDF
GTID:2218330338457830Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As a supervised learning, ensemble classification has been proven to be very popular and effective to improve the learning accuracy. Based on the same principle, the purpose of cluster ensemble is to ensemble the results of multiple partitions in order to get a clustering result with higher quality and robustness. Currently there have been many cluster ensemble algorithm, a large number of theoretical and experimental results show that:Compared with a single cluster, cluster ensemble has obvious advantages.In many clustering algorithms, though it is rapid and efficient to use the grid-based clustering approach to learn the partition of data sets, the margin of each cluster constructed by the approach presents zigzag manner, which prohibits the recognition of smooth boundary surface. Based on the in-depth analysis of previous cluster ensemble algorithms and combined the advantages and disadvantages of grid clustering algorithm. This paper proposes a kind of grid-oriented cluster ensemble algorithm:Rotation Grid (RG for short). This algorithm can effectively solve the problem that boundary is treated in a non-smooth way. Rotation Grid algorithm has two major aspects:(1) generate a number of cluster members with difference; (2) design ensemble functions to fuse the cluster members. This paper focuses on this two aspects. In generating cluster members, instead of constructing the partitions with diversity on a given data set by random sampling or initializing parameters of corresponding algorithm, RG iteratively splits the features set into K subsets, uses feature transformation method on the subsets to learn K different rotation basis, and applies grid cluster algorithm to the new feature space formed by the K axis rotations to learn the partitions with diversities; as to the design of ensemble function, this paper follows the idea of hyper graph-based. Each cluster will be written in vector form among all cluster members, each vector as a vertex, then construct a weighted hyper graph. Calculate the edge weight between all vertices in proper order and identify the two clusters which have the largest edge weight, label the same label, and then use voting method to determine which clusters for each point of data sets to belong to. Plenty of experimental results show that compared with single grid clustering, RG can not only partitions the data set with arbitrary shape or size efficiently, but also smoothes the rough boundary.
Keywords/Search Tags:data mining, clustering algorithms, cluster fusion, grid algorithm, characteristic transfonnation
PDF Full Text Request
Related items