Font Size: a A A

The Study And Improvement Of Fuzzy C-means Cluster Algorithm

Posted on:2015-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z B WangFull Text:PDF
GTID:2298330431492576Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of most important technology and research hotspot inthe domain of data mining, which has achieved fruitful results in theory and methodand plays a key role in data analysis in various fields. The k-means algorithm basedon partition is the most classical one and has been applied to numerous domain.Fuzzy C-means algorithm is a variant of the k-means algorithm, which maintains itssimple and easy characteristic. Fuzzy C-means has nearly linear time complexity, andit is effective and scalable to large-scale data mining.In order to solve the difference of the similarity measure based on Euclideandistance is not very obvious among objects in multi-dimensional data set, and weproposed a fuzzy C-means clustering algorithm based on the coefficient of variation.The algorithm uses Euclidean distance weighted by the coefficient of variation, andintroduces the selection of initial cluster center based on maximum distance, thismethod takes the reciprocal of the sum of the KNN distance as the density of a objectand filters the outline and noise points, selects objects with maximum distance ascluster centers in high-density objects. The membership matrix is computed by theweighted Euclidean distance and the new cluster centers are updated by themembership matrix. Experimental results show that the proposed algorithm issuperior to the original Fuzzy C-means.A weighted fuzzy C-means clustering algorithm is proposed to improve theaccuracy of clustering in maxed data. Weights are calculated by the sum of distanceof numerical attributes and categorical attributes. The centroids on numericalattributes are randomly selected, and the membership matrix is computed by theweighted Euclidean distance and used to update the new cluster centers. Oncategorical attributes, the initial centroids are determined by the clusters partitionedby the randomly initial centroidson numerical attributes, and each object oncategorical attributes is partitioned into the centroid which has the maximum ofmembership of the object, and the centroid is represented by the set of frequencyappeared in each attribute of data objects belonged to the cluster. Experimental resultsshow that the proposed algorithm can discover clusters in the mixed numeric and categorical dataset, and the accuracy is slightly improved compared with the existingsimilar algorithms.
Keywords/Search Tags:data mining, fuzzy C–means, clustering analysis, select centroid, coefficient of variation, mixed data
PDF Full Text Request
Related items