The Study And Improvement Of Fuzzy C-means Cluster Algorithm

Posted on:2015-09-26

Degree:Master

Type:Thesis

Country:China

Candidate:Z B Wang

Full Text:PDF

GTID:2298330431492576

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Clustering analysis is one of most important technology and research hotspot inthe domain of data mining, which has achieved fruitful results in theory and methodand plays a key role in data analysis in various fields. The k-means algorithm basedon partition is the most classical one and has been applied to numerous domain.Fuzzy C-means algorithm is a variant of the k-means algorithm, which maintains itssimple and easy characteristic. Fuzzy C-means has nearly linear time complexity, andit is effective and scalable to large-scale data mining.In order to solve the difference of the similarity measure based on Euclideandistance is not very obvious among objects in multi-dimensional data set, and weproposed a fuzzy C-means clustering algorithm based on the coefficient of variation.The algorithm uses Euclidean distance weighted by the coefficient of variation, andintroduces the selection of initial cluster center based on maximum distance, thismethod takes the reciprocal of the sum of the KNN distance as the density of a objectand filters the outline and noise points, selects objects with maximum distance ascluster centers in high-density objects. The membership matrix is computed by theweighted Euclidean distance and the new cluster centers are updated by themembership matrix. Experimental results show that the proposed algorithm issuperior to the original Fuzzy C-means.A weighted fuzzy C-means clustering algorithm is proposed to improve theaccuracy of clustering in maxed data. Weights are calculated by the sum of distanceof numerical attributes and categorical attributes. The centroids on numericalattributes are randomly selected, and the membership matrix is computed by theweighted Euclidean distance and used to update the new cluster centers. Oncategorical attributes, the initial centroids are determined by the clusters partitionedby the randomly initial centroidson numerical attributes, and each object oncategorical attributes is partitioned into the centroid which has the maximum ofmembership of the object, and the centroid is represented by the set of frequencyappeared in each attribute of data objects belonged to the cluster. Experimental resultsshow that the proposed algorithm can discover clusters in the mixed numeric and categorical dataset, and the accuracy is slightly improved compared with the existingsimilar algorithms.

Keywords/Search Tags:

data mining, fuzzy C–means, clustering analysis, select centroid, coefficient of variation, mixed data

PDF Full Text Request

Related items

1	Improvement And Application Of K-means Algorithm
2	Research And Application Of New Methods In Symbolic Clustering
3	Research On Fuzzy Clustering Analysis In Data Mining
4	The Modified K-MEANS Algorithm And Its Application To Type-Ⅰ Diabetes Glucose Data Clustering
5	Study Of Auto-Adaption Fuzzy C-Means Clustering Algorithm
6	Study And Analysis On Clustering Algorithm In Data Mining
7	Study On The Storage Allocation Strategy Of Goods Based On Data Mining
8	Fuzzy Data Mining Technique In The Application Of The Atmospheric System
9	The Improvement On The Fuzzy C-means Algorithm
10	Research Of Database Access Log Based On Weka