Font Size: a A A

Research And Application Of Clustering Parallel Strategy For Affinity Propagation

Posted on:2019-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:W C WangFull Text:PDF
GTID:2348330569488301Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Affinity propagation(AP)algorithm is a new type of clustering algorithm.It is an algorithm based on class representative points.The algorithm has the advantages of automatically determining the number of clusters and being able to handle asymmetric data of similar relationships.The basic idea is to represent the closeness of the relationship between the two objects by maintaining the similarity matrix,the availability matrix,and the responsibility matrix between them,and to screen the representativeness of the relationship by iteratively updating the availability matrix and the responsibility matrix.the experiment shows that when the algorithm performs iterative calculation,the clustering accuracy is the same,the number of iterations is much less than other types of clustering algorithm,which proves that the algorithm has a very efficient clustering accuracy.However,the disadvantages are also obvious.That is,the time of each iteration is high.Therefore,although the Affinity Propagation algorithm has high clustering accuracy,it does not have the ability to process massive data.In response to the above issues,the main work of this paper is as follows:1.In order to achieve the choice of parallel platform selection,through the in-depth research and analysis of Hadoop and Spark and their respective ecosystems,then summarize their own characteristics,and summarize the application scenarios of Hadoop MapReduce and Spark respectively.2.In order to solve the problem of Affinity Propagation algorithm does not have the ability to process massive data,based on the MapReduce platform,combined with efficient clustering evaluation indicators,the parallelization strategy of the algorithm is studied,and a semi-supervised distributed Affinity Propagation algorithm is proposed.First of all,the data is divided,and the local representative punctuation is obtained by clustering,then the local representative representative points and other supervisory parameters are integrated to make the global rigorous,thereby obtaining a class representative point with global representation.Experimental results show that the algorithm has good accuracy and efficiency.At the same time,it is applied to civil aviation over-limit events.Experiments show that the algorithm can perform effective QAR data analysis.3.In order to solve the problem of the classical K-means algorithm is sensitive to the initial clustering center and can not process massive data,combined with the Affinity Propagation algorithm,a parallel K-means algorithm optimized by the Affinity Propagation algorithm is proposed.First,the distributed Affinity Propagation algorithm above is modified,and then it is run and the result is input to the cluster nodes as the initial clustering center of K-means.Finally,the K-means algorithm is run in parallel.Experimental results show that the optimization algorithm has obvious advantages in processing time when dealing with big data.4.Finally,the parallelization strategy of the Affinity Propagation algorithm was transplanted to the Spark platform,and the distributed Affinity Propagation algorithm based on the Spark platform was initially implemented.Experiments show that this strategy is still feasible on Spark.
Keywords/Search Tags:Affinity Propagation, Clustering, Parallelization, Hadoop, Spark
PDF Full Text Request
Related items