Font Size: a A A

Research On Semi-Supervised Fuzzy Clustering Ensemble

Posted on:2016-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:C F FengFull Text:PDF
GTID:2308330461470432Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Data clustering is one of the importance tools in data mining, which aims to categorize data into groups or clusters such that the data in the same cluster are more similar to each other than to those in different clusters. Although, a large number of algorithms have been introduced for clustering categorical data, there is no single clustering algorithm that performs best for all data sets and can discover all types of cluster shapes and structures presented in data. Faced with data sets with various shapes or structures, to find a single clustering algorithm suitable for the data set has become increasingly difficult. Clustering ensemble methods combine various clustering algorithms or same clustering with different parameters setting to yield a single overall clustering. However, the existing algorithms are mostly unsupervised algorithms, which cannot take advantages of known information of data set. As a result, the precision of a clustering ensemble is degraded. Semi-supervised learning will play a significant role by combining both the advantages of semi-supervised clustering and clustering ensemble.Most traditional ensemble algorithms are usually used the results of hard clustering as input, while most of the samples in really life with a fuzzy, using hard clustering as base clustering may lead to the loss of some useful information; at the same time, base clustering results are also accompanied by the generation of underlying information. Underlying information may better improve the clustering performance and quality. A link-based fuzzy cluster ensemble (LBFCE) is proposed due to many clustering ensemble methods ignore the underlying information or acquire the underlying information by complex approaches. In particular, to get the underlying information clearly and efficiently, the matrix presents the relations between data and clusters is transformed into a weighted graph with data relations by appropriate link analysis. A graph partitioning algorithm is employed to get the final clustering results.In the research of semi-supervised clustering ensemble, pairwise constraints are selected to guide the ensemble step. In the process of selecting pairwise constraints, it is necessary to consider not only the relationship between the constrained point and the point being constraint, but also the relationship between the neighbors around these points, so that the prior information can be extended. According to a given radius or the Gaussian distribution, two effective patterns of selecting the constraint neighbors are presented. The related information of the data itself combined with semi-supervised information to extend the prior information, and the prior information is used to guide the ensemble process. Experimental results demonstrate that the proposed approach may improve clustering performance effectively.
Keywords/Search Tags:clustering ensemble, fuzzy clustering, underlying information, data correlation
PDF Full Text Request
Related items