Font Size: a A A

The Research On Fuzzy C-means Algorithm

Posted on:2011-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:J Y CaiFull Text:PDF
GTID:2178330332956554Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of database technology and database management system used widely, many huge data is accumulated in organizations. In order to extract useful information and make better use of these resources, Data mining technology is proposed. Data mining combines the method of traditional data analysis with the complex algorithm to process mass data, and is a superior area in the information and database technology.As one of the main methods of data mining, clustering analysis partitions data set into the meaningful groups or clusters. In many of the cluster analysis algorithm, fuzzy clustering algorithm is the current research hotspot. This paper researches on the most classical fuzzy c-means algorithm(FCM),and proposes an improved algorithm based on the disadvantage of FCM. Experimental result illustrates its effectiveness and feasibility.This paper systematically analyzed FCM algorithm and basic principle of Mahalanobis distance, using the advantages of Mahalanobis distance to remedy the defects in the FCM algorithm, and using optimized KPCA to extract features. We improved FCM algorithm from the third aspects.First, FCM is based on Euclidean distance function, which can only be used to detect spherical structural clusters. When FCM processes some dataset of high correlation, error probability will be increased. Focusing on above two problems, this paper proposes an improved new algorithm called fuzzy c-means based on Mahalanobis distance function (FCM-M), and add a regulating factor of covariance matrix to each class in objective function. Using Mahalanobis distance, FCM-M algorithm effectively solves the shortcoming of FCM algorithm. There are efficient methods to solve singular values problem for finding Eigen_ value and eigenvectors of a symmetric matrix or computing pseudoinvertion involved in finding the Mahalanobis distance.Second, FCM regards the sample features have the same contribute to the cluster result; no thinking over the different features may have the different impact to the cluster result. When FCM processes some dataset of high correlation, error probability will be increased. Focusing on above two problems, this paper proposes an improved new fuzzy clustering algorithm based on feature weighted Mahalanobis distance function. Using adaptive Mahalanobis distance to weight the feature, the new algorithm can effectively cluster to the datasets of high correlation.Finally, kernel PCA method extracts feature from large samples and high dimension data sets, combining cultural algorithms(CA) to select optimized kernel function or near optimized kernel function. FCM based on the method not only effectively extracts the nonlinear information from the samples but also reduces dimension . The paper will accomplish the above-mentioned algorithms by MATLAB. Experimental results of data clustering of UCI and image segmentation illustrate the expected effect.
Keywords/Search Tags:Fuzzy theory, Fuzzy c-means, Mahalanobis distances, Cultural Algorithms
PDF Full Text Request
Related items