Font Size: a A A

PSO-SMFCM:A Novel PSO-based Multi-subset Fuzzy Clustering Algorithm

Posted on:2019-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q XueFull Text:PDF
GTID:2428330545997829Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of science and technology,the data has exploded.In order to obtain the effective information in the massive data,the data mining technology began to flourish.As an important direction of data mining,clustering algorithm has received unprecedented attention.The urgent need for high timeliness and high performance clustering algorithm which can adapt to large-scale data,makes the traditional clustering algorithm inevitably face great challenges.Fuzzy c-means clustering algorithm(FCM)is one of the most popular fuzzy clustering algorithms.The traditional FCM algorithm requires that the data set to be processed be loaded into memory,and when the data set is large,it can't be work.There are many improved fuzzy c-means algorithms are proposed to deal with large-scale data.In these algorithms,Scalable Random Sampling with Iterable Optimization Fuzzy C-Means Algorithm(SRSIO-FCM)achieves better performance than other improved algorithms.SRSIO-FCM proposed to randomly divide data points into multiple subsets and run FCM algorithm on each subset.It also proposed that the clustering results of the first two subsets should be combined to be the initial clustering center of the third subset.The results of each subset are combined in the form of interlocking,and the final clustering is obtained.Although the SRSIO-FCM algorithm is much more efficient than other clustering algorithms for large-scale data,it still has some shortcomings.First,it selects the initial clustering center in a random way,which can cause unstable clustering results easily.Second,it does not take into account the distribution characteristics of the data points in the subset,and it is easy to cause the clustering results to be inaccurate.In this paper,the PSO-based Multi-subset Fuzzy Clustering Algorithm is proposed(PSO-SMFCM).It can overcome defects of SRSIO-FCM very well.Firstly,the particle swarm algorithm is introduced to determine the high quality initial clustering center.It makes the clustering result more stable.At the same time,a high quality initial clustering center can reduce the number of iterations and shorten the computation time.Secondly,it defines a new combining method for the merging of the subset results.And it fully considers the distribution characteristics of data points among subsets.So it can effectively improve the accuracy of clustering results.At the same time,considering that the FCM algorithm involves multiple iterations,we choose the Spark platform suitable for multiple iterations to run the program.On the theory,the ideas and steps of PSO-SMFCM algorithm and SRSIO-FCM algorithm are described and compared in detail.On the experiment,through a lot of experiments,we prove the superiority of the algorithm in the aspects of efficiency,accuracy and stability.Finally,it is proved that PSO-SMFCM is more efficient,more accurate and more stable than other existing algorithms.
Keywords/Search Tags:Fuzzy c-means clustering, Particle swarm optimization, Big data
PDF Full Text Request
Related items