Font Size: a A A

Research On Fuzzy Clustering Algorithm Based On Distance Metric

Posted on:2022-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:H X ZhouFull Text:PDF
GTID:2518306506971489Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Fuzzy clustering algorithm is an important technology in the field of data mining.Fuzzy clustering algorithm,as an unsupervised machine learning method,is able to divide the unlabeled sample data into multiple clusters,and makes the similarity between the sample data in the same cluster as large as possible,and the similarity between the sample data in different clusters as small as possible.Distance metric is an important factor to measure the similarity between sample data points,so the clustering performance of fuzzy clustering algorithm largely depends on the choice of distance metric.However,for different data features,the fuzzy clustering algorithm based on Euclidean distance metric can not get better clustering results.Therefore,it improves the accuracy and stability of clustering greatly to choose the appropriate distance metric to optimize the fuzzy clustering algorithm.The work of this paper mainly includes the following three aspects.(1)The clustering accuracy of improved possibility c-means(IPCM)clustering algorithm is affected by the irregular distribution of data greatly.To overcome this defect,the adaptive distance metric based on the fuzzy covariance matrix is applied to the IPCM clustering algorithm,and then an adaptive improved possibility c-means(AIPCM)clustering algorithm is proposed.The proposed AIPCM clustering algorithm is compared and analyzed on the tea dataset.First,multiple scatter correction(MSC),principal component analysis(PCA)and linear discriminant analysis(LDA)are employed to process the tea dataset,and then AIPCM clustering algorithm and traditional clustering algorithms are run on the processed tea dataset.Experimental results show that the AIPCM clustering algorithm always has the highest accuracy rate under different fuzzy weighting parameters and the changed the number of training samples and test samples for clustering,and the AIPCM clustering algorithm has the highest clustering accuracy rate.The distance between the terminal clustering center of the AIPGG clustering algorithm and the true clustering center isEAIPCM(28)0.3057,which is obviously closer to the true cluster centers.The AIPCM clustering algorithm only needs 29 iterations to reach the convergence state,which is faster than the traditional clustering algorithms.(2)The distance metric of the IPCM clustering algorithm is optimized again using the exponential distance metric with the fuzzy covariance matrix,which is used to calculate the similarity between each sample data,and an improved possibility Gath-Geva(IPGG)clustering algorithm is proposed.The performance of the IPGG clustering algorithm is analyzed by clustering the apple dataset.First,MSC and PCA are performed on the apple data set,and then the detailed analysis is carried out from the three aspects:clustering accuracy,cluster centers and iterative convergence results.Experimental results show that the clustering accuracy of IPGG clustering algorithm is significantly higher than those of FCM,GK,GG and IPCM;the terminal cluster centers of IPGG clustering algorithm is closer to the true cluster centers;IPGG clustering algorithm only needs 13 iterations to reach the convergence state,so the convergence speed is faster than the other fuzzy clustering algorithms.(3)Inspired by the IPGG clustering algorithm,the exponential distance metric based on the fuzzy covariance matrix is extended to the possibility fuzzy c-means(PFCM)clustering algorithm,and a possibility fuzzy Gath-Geva(PFGG)clustering algorithm is proposed.The PFGG clustering algorithm is able to cluster and identify the noise data points x1 9 and x20 by clustering theX20 dataset,and the typical values are 0.0141 and 0.0297 respectively,so it can overcome the defect of noise data sensitivity.The clustering accuracy,cluster centers and iterative convergence results of the PFGG clustering algorithm are analyzed on three datasets(Seeds dataset,Coffee dataset and Meat dataset)respectively.The results show that the PFGG clustering algorithm has the highest clustering accuracy by continuously modifying the fuzzy weighting parameters and coefficients,and the distance between the terminal cluster centers and the true cluster centers is the closest and the iteration convergence speed is the fastest.
Keywords/Search Tags:Data mining, Fuzzy clustering algorithm, Distance metric, The similarity, Fuzzy covariance matrix
PDF Full Text Request
Related items