Font Size: a A A

An Improved Affinity Propagation Clustering Algorithm For Reducing Complexity

Posted on:2020-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2428330602451840Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In the era of big data,understanding the distribution and characteristics of data,and discovering useful information has become an important research topic.Therefore,many scholars have proposed data mining based on techniques such as machine learning,database and mathematical statistics.As an unsupervised data mining method,clustering can divide data into several categories according to similarity according to the internal distribution law of data.Although many scholars have proposed a variety of clustering algorithms,as the actual data scale continues to expand and the application scenarios become more complex,clustering algorithms still face many problems.AP(Affinity Propagation)clustering algorithm is a new clustering algorithm published in "Science".It does not need to specify the clustering category and the initial representative point in advance.It only needs to input the similarity matrix of the data points.The message delivery mechanism automatically iterates until the appropriate class representative point is determined.The results show that the algorithm performs better on most data sets than the traditional clustering algorithm.Although the AP clustering algorithm has many advantages over the traditional clustering algorithm and is applied to many fields,it still faces many challenges.First,although the AP clustering algorithm does not need to set the clustering category in advance,its input parameters,the value of the biasing parameter will indirectly affect the clustering category,which can describe the degree of bias of each data point as a class representative point.The larger the general bias parameter,the more the final clustering category,so how to set a suitable bias parameter value becomes a key problem in the AP clustering algorithm.Moreover,in the original AP clustering algorithm,the bias parameter values of all data points are the same constant,which is undoubtedly not suitable for the actual situation.Therefore,in this paper,we propose to set the bias parameter based on the data density for the value of the bias parameter,that is,to give different bias parameter values based on the distribution of the data points,and introduce the concept of grid to calculate the data density distribution,and minimize the additional calculation.Secondly,the computational complexity of the AP clustering algorithm is O(N×N×logN).When the data size is large,the time consumption of the algorithm will increase rapidly,and a huge similarity matrix needs to be stored.Therefore,this paper introduces two sampling algorithms to reduce the input data as much as possible,and then performs AP clustering algorithm after obtaining representative sample data.The first one uses the simplest and most efficient simple random sampling.In order to avoid the loss of small clusters caused by random sampling and the instability of sampling results,the idea of adding stratification is to first group the data by random sampling.The first level of the AP clustering algorithm is selected on the group to select the global representative sample points,and then the second cluster is performed on the sample points.The second method uses more complex density deviation sampling,sampling based on the density distribution of the data,can overcome the shortcomings of random sampling,and also introduces meshing when calculating the density of data points.In this way,the AP algorithm can be directly run through one sampling,and the efficiency of the algorithm is improved on the basis of the first algorithm.Finally,simulation experiments are carried out on UCI database and artificial dataset.The experimental results show that the improved algorithm based on bias parameters has a great improvement in clustering accuracy.The two improved algorithms based on sampling are better than AP clustering in algorithm efficiency.Algorithm,where the algorithm based on density deviation sampling takes the shortest time.Secondly,combined with optimization based on bias parameters and sampling-based algorithm,it is found that the combined algorithm is better than the single application in accuracy and efficiency.Finally,the algorithm is compared with other improved algorithms.
Keywords/Search Tags:Affinity Propagation Clustering Algorithm, Preference, Grid, Simple Random Sampling, Density Deviation Sampling
PDF Full Text Request
Related items