Font Size: a A A

Research On Semi-supervised Clustering Ensemble Based On Dynamic Decision

Posted on:2016-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:W C XiaoFull Text:PDF
GTID:2308330461470353Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is a class of discovery process that divides data into subset. Each subset represents a cluster, where the intra-cluster similarity is maximized and the inter-cluster similarity is minimized. It has been widely applied to many application. Clustering ensembles can go beyond what is typically achieved by a single clustering algorithm in several respects, such as accuracy, robustness, stability and parallelization. Semi-supervised clustering ensemble technology uses prior knowledge to guide the process of clustering ensemble, which may obtain better clustering result than unsupervised clustering ensemble.In recent years, some scholars have combined swarm intelligence optimization algorithms with clustering analysis, its fundamental strategy is to convert the clustering problem into an optimization problem, and carry out stochastic search by simulating the intelligent behavior of swarms to find the best objective function value of clustering division. The swarm intelligence optimization algorithms have been widely applied in the fields of clustering analysis, including ant colony algorithm, particle swarm optimization, artificial immune algorithm, shuffled frog leaping algorithms, fish swarm algorithms, bee colony algorithm and so on. Inspired by the idea of fruit fly optimization algorithms, this thesis presents a clustering analysis algorithm based on swarm intelligence. The algorithm updates the three-dimensional coordinates of fruit flies to global optimum position of each iteration and then each fruit fly carries out stochastic search within the region it resides, so as to gradually optimize cluster centers by iteration. Compared with other clustering algorithms of swarm intelligence, the proposed algorithm is simple and with fewer parameters. The experimental results demonstrate that the proposed method outperforms other algorithms regarding the accuracy and convergence time.Clustering ensemble is an important part of ensemble learning. It aims to study and integrate multiple clustering results from different clustering algorithms or same algorithm with different initial parameters for the same dataset. CHAMELEON is a hierarchical clustering algorithm which can discover natural clusters of different shapes and sizes as the result of its merging decision dynamically adapts to the different clustering model characterized. Inspired by the idea of CHAMELEON, the thesis proposes a novel clustering ensemble model including semi-supervised method. The model is divided into three phases. Phase 1 is constructing a sparse graph through similarity matrix which aggregates multiple clustering results. Phase 2 is partitioning the sparse graph (vertex= object, edge weight= similarity) into a large number of relatively small sub-clusters. Phase 3 is obtaining the final clustering partition by merging these sub-clusters repeatedly. The experimental results demonstrate that the proposed method outperforms other ensemble algorithms regarding the accuracy and stability.
Keywords/Search Tags:clustering analysis, swarm intelligence, semi-supervised clustering ensemble, dynamic decision
PDF Full Text Request
Related items