Font Size: a A A

Research On Fuzzy Clustering Algorithm

Posted on:2013-08-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:S CengFull Text:PDF
GTID:1228330392455653Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Cluster Analysis is one of the core technologies of Knowledge Discovery and Data Mining (KDD).Fuzzy C-Means (FCM) clustering algorithm is the prototype-based clustering algorithm with thecharacteristics of simpleness, robustness, high efficiency, strong data adaptability, which make it oneof the most frequently-used cluster analysis algorithms. The most concerned issues are:(1) To definethe target function in the FCM algorithm in a proper way, and the target function not only reflects theprinciples set by the distance ratio of within/between class but also reflects the various features andthe importance of different samples;(2) No matter how the target function in the FCM algorithm isdefined, there is a corresponding clustering prototype. And the convergence speed, as well asclustering effect, must depend on the initial partition. How to create a algorithm based on fuzzy theoryto avoid the problem of clustering prototype and fundamentally solve the sensitivity of initial divisionis the issue;(3) Another issue is how to depict the semi-supervised algorithm FCM appropriately tomake supervision sample not only embody its typicality but also not lose its limitations;(4) How toreduce the computation of FCM algorithm is also a concern.For different types of data samples, the traditional FCM algorithm usually requires different typesof distance measure function. To settle this problem, the FCM clustering algorithm based on manifoldlearning is proposed. For high-dimensional feature samples, a more appropriate distance measure ofsimilarity between samples is put forward from the study of a limited number of training samples.And the FCM clustering algorithm of dimensionality reduction is proposed. In addition, theKullback-Leibler (KL) divergence is introduced into the objective function of traditional FCMalgorithm to regularize the FCM. And the distance function uses a Gaussian mixture distribution,which performs well when applied to image segmentation. A new FCM clustering algorithm(FCM_GMM_PSKL) based on symmetric KL distance of the Gaussian mixture distribution and theKL divergence regularization is proposed. The FCM_GMM_PSKL works well in practice.The FCM is sensitive to the initial prototypes, and it doesn’t work well only if the clusters areglobular. In this paper, the feature-weighted FCM clustering algorithms are proposed. And amulti-prototype FCM algorithm based on similarity and spectral decomposition is proposed. Largeclusters or elongated shaped clusters are first divided into lots of small clusters by using thefeature-weighted FCM algorithm. The memberships, which represent the degrees those samplesbelong to, are used as the features to compute the similarity among small clusters. The Folydalgorithm, developed from the Zadeh’s operations, is used to modify the similarity. The real affinitymatrix, got from former steps, is regarded as the similarity matrix to generate the Laplacian matrixused in spectral decomposition. At last, the FCM algorithm is applied again to cluster the spectralfeatures to merge the small clusters. The eigenvectors corresponding to the second minimaleigenvalues (issues in connection with curve segmentation) and the third minimal eigenvalues (issuesin connection with surface segmentation) are chosen to be the spectral features.Most variants of Fuzzy C-Means (FCM) clustering algorithms involving prior knowledge aregenerally based on the modification of the objective function or the clustering process. This paperproposes a new weighted semi-supervised FCM algorithm (SSFCM-HPR) that transforms the prior knowledge in the labeled samples into constraint conditions in terms of fuzzy membership degrees,assigns different weights according to the representativeness of the samples, and then uses the HPRmultiplier to solve the clustering problem. The “representativeness” of the labeled samples is decidedby their distances to the cluster centers they belong to. In this paper we take the ratio of the largest tothe second largest fuzzy membership degree from a labeled sample as its weight. This algorithm notonly retains the fuzzy partition of the labeled samples which guarantees the effective guidance on theclustering process, but also can detect whether a sample is an outlier or not. Moreover, when part ofthe supervised information of the labeled samples is wrong, this algorithm can reduce the influence ofthe incorrectly labeled samples on the final clustering results.
Keywords/Search Tags:Fuzzy C-means Algorithm, Kullback-Leibler (KL) Divergence, Gaussian MixtureModels, Spectral Decomposition, HPR Multiplier, Semi-Supervised Fuzzy Clustering
PDF Full Text Request
Related items