Research And Application Of Clustering Parallel Strategy For Affinity Propagation

Posted on:2019-06-20

Degree:Master

Type:Thesis

Country:China

Candidate:W C Wang

Full Text:PDF

GTID:2348330569488301

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Affinity propagation(AP)algorithm is a new type of clustering algorithm.It is an algorithm based on class representative points.The algorithm has the advantages of automatically determining the number of clusters and being able to handle asymmetric data of similar relationships.The basic idea is to represent the closeness of the relationship between the two objects by maintaining the similarity matrix,the availability matrix,and the responsibility matrix between them,and to screen the representativeness of the relationship by iteratively updating the availability matrix and the responsibility matrix.the experiment shows that when the algorithm performs iterative calculation,the clustering accuracy is the same,the number of iterations is much less than other types of clustering algorithm,which proves that the algorithm has a very efficient clustering accuracy.However,the disadvantages are also obvious.That is,the time of each iteration is high.Therefore,although the Affinity Propagation algorithm has high clustering accuracy,it does not have the ability to process massive data.In response to the above issues,the main work of this paper is as follows:1.In order to achieve the choice of parallel platform selection,through the in-depth research and analysis of Hadoop and Spark and their respective ecosystems,then summarize their own characteristics,and summarize the application scenarios of Hadoop MapReduce and Spark respectively.2.In order to solve the problem of Affinity Propagation algorithm does not have the ability to process massive data,based on the MapReduce platform,combined with efficient clustering evaluation indicators,the parallelization strategy of the algorithm is studied,and a semi-supervised distributed Affinity Propagation algorithm is proposed.First of all,the data is divided,and the local representative punctuation is obtained by clustering,then the local representative representative points and other supervisory parameters are integrated to make the global rigorous,thereby obtaining a class representative point with global representation.Experimental results show that the algorithm has good accuracy and efficiency.At the same time,it is applied to civil aviation over-limit events.Experiments show that the algorithm can perform effective QAR data analysis.3.In order to solve the problem of the classical K-means algorithm is sensitive to the initial clustering center and can not process massive data,combined with the Affinity Propagation algorithm,a parallel K-means algorithm optimized by the Affinity Propagation algorithm is proposed.First,the distributed Affinity Propagation algorithm above is modified,and then it is run and the result is input to the cluster nodes as the initial clustering center of K-means.Finally,the K-means algorithm is run in parallel.Experimental results show that the optimization algorithm has obvious advantages in processing time when dealing with big data.4.Finally,the parallelization strategy of the Affinity Propagation algorithm was transplanted to the Spark platform,and the distributed Affinity Propagation algorithm based on the Spark platform was initially implemented.Experiments show that this strategy is still feasible on Spark.

Keywords/Search Tags:

Affinity Propagation, Clustering, Parallelization, Hadoop, Spark

PDF Full Text Request

Related items

1	Research And Application Of Parallelizing Affinity Propagation Clustering Based On Distributed Computing
2	Research And Application Of Affinity Propagation Based On Spark And Its Incremental Algorithm
3	Research On Affinity Propagation And Its Application In Image Clustering
4	Research On Affinity Propagation Clustering Algorithm
5	Improvement Study Of Affinity Propagation Clustering Algorithm And Its Applications
6	The Study Of Modified Affinity Propagation Clustering And It's Application
7	Research And Implementation Of Clustering And Neural Network Algorithm Based On Cloud Computing Platform Hadoop
8	Beyond Affinity Propagation: Message Passing Algorithms for Clustering
9	Research And Application Of Parallelization Optimization Of Spatial Clustering Algorithm Based On Spark
10	Research And Application Of Distributed Demantic Neighbor Search Algorithm Based On Spark