Font Size: a A A

Fast Sparse Affinity Propagation Clustering Algorithm For Large-Scale And High-Dimensional Data

Posted on:2020-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2428330590472541Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and Internet technology,the scale of data generated by various industries has been increasing,and its complexity has also increased.In general,large-scale and complex data mainly contains two characteristics.First,the amount of data is large and the growth rate is fast.Second,the data dimension is high and contains many redundancy features.Traditional data mining and processing algorithms are often unsatisfactory.Therefore,how to efficiently extract valuable information from large-scale complex data has become a hot research topic.The Affinity Propagation Clustering Algorithm(AP algorithm)is a clustering algorithm based on information transfering.It has the advantages of not specifying the number of clusters in advance and the clustering effect is stable.However,when the complexity of the data is enhanced,the complexity of the AP algorithm is also very obvious.This paper focuses on the improvement of AP algorithm,retains the advantages of AP algorithm,and realizes its promotion in large-scale and high-dimensional data.In this paper,on the one hand,for the characteristic of large size,a sparse fast Affinity Propagation clustering algorithm based on core point extraction is proposed,which is called CFAP algorithm.Firstly,the core set extraction method based on Gaussian kernel similarity is used to extract the core set to reduce the size of big data.Then,based on the discriminant idea of K-NN classification algorithm,combined with the AP algorithm based on the characteristics of information transfer,the sample is utilized.Only the nearest K samples are passed,and the similarity matrix of the core set is sparse.Finally,the CFAP algorithm is compared with the HAP algorithm and the AP algorithm on some data sets.The CFAP algorithm is verified in time by experimental analysis and comparison,Efficient performance and effectiveness of clustering results.On the other hand,for the characteristic of the high data dimension,this paper proposes to apply the CFAP algorithm in the framework of SAS-Clustering algorithm to realize the promotion of CFAP algorithm on high-dimensional data.Firstly,in view of the shortcomings of the SAS-Clustering algorithm framework itself,it is proposed to use the Golden-Section golden section search method instead of the Grid-Search grid search method,which greatly improves the search efficiency of the best feature set S.Secondly,For K-means clustering algorithm unstable effect this defect.The CFAP algorithm is used to replace the K-means algorithm under the framework,and the SAS-CFAPalgorithm is proposed.Finally,the robustness and feasibility of the SAS-CFAP algorithm are verified by experiments.
Keywords/Search Tags:Large-scale and high-dimensional data, Affinity Propagation Clustering, Core-set extraction, K-NN, SAS-Clustering algorithm
PDF Full Text Request
Related items