As a kind of unsupervised learning in machine learning,the fuzzy clustering algorithm can divide the unsigned data samples into several classes,and make the data in the same class as similar as possible,while the data between different classes as different as possible.The degree of similarity between samples is measured by distance metrics and different distance metrics can lead to different clustering results.At present,the distance metric used in fuzzy clustering algorithm is basically Euclidean distance,which often fails to produce good clustering results when dealing with different data features.Therefore,choosing a suitable distance metric can improve the performance of the fuzzy clustering algorithm and thus achieve accurate classification of the data.The main contents of this paper are as follows:(1)The fuzzy C-means clustering algorithm(FCM)is easily affected by noise data and outliers,so an improved FCM clustering algorithm(IFCM)is proposed by using the form of Euclidean distance function as a new distance metric.By clustering the X12 dataset,the IFCM clustering algorithm can give the maximum fuzzy membership values to the class center points and more average fuzzy membership values to the other points in the class,thus verifying the performance of the algorithm has been improved in noisy environment.The IFCM clustering algorithms are compared and analyzed on the IRIS dataset,IRIS-2D dataset,Wine dataset,Apple dataset and Red jujube dataset from three aspects:clustering accuracies,clustering centers and iteration numbers.The results show that the IFCM clustering algorithm has the highest clustering accuracies of 92.67%,90.67%,81.46%,88%and 87.78%on the five datasets respectively and the final clustering centers generated by IFCM are closer to the real clustering centers.(2)Inspired by the IFCM algorithm,the distance metric in the form of the Euclidean distance function is applied to the fuzzy entropy clustering algorithm(FE),and then an improved fuzzy entropy clustering algorithm(IFE)is proposed.Clustering on the X12 dataset,the IFE can produce more accurate clustering centers.The clustering accuracies,clustering centers and iteration numbers of the IFE clustering algorithm are compared and analyzed on the IRIS dataset,IRIS-2D dataset,Red jujube dataset and Meat dataset.The results show that although the IFE algorithm has more iteration numbers,it has higher clustering accuracies of 92.67%,90.67%,88.33%and 93.33%on the four datasets and the final clustering centers generated on the IRIS dataset are closer to the real clustering centers.(3)Aiming at the problem of low clustering accuracies of the possibilistic fuzzy c-means clustering algorithm(PFCM)in processing non-hyperspherical datasets with heterogeneous density,a new distance metric is formed by normalizing the distance between the data points and the class centers and adding the distance variation of each data into the original Euclidean distance.An improved possibilistic fuzzy C-means clustering algorithm(PFCM-σ)is proposed based on the new distance metric and PFCM algorithm.The typical values generated by the PFCM-σalgorithm for the two noisy data points x19 and x20 in the X20 dataset are much smaller than the normal data points,indicating that the PFCM-σalgorithm can accurately cluster the datasets containing noisy data.The clustering accuracies,clustering centers and iteration numbers are computed on the IRIS dataset,IRIS-3D dataset,Olive dataset and Meat dataset.The results show that although the PFCM-σalgorithm requires more iteration numbers,it has the highest clustering accuracies of 93.33%,92.67%,93.33%and 95.83%on the four datasets respectively and the final clustering centers are closer to the real clustering centers. |