Font Size: a A A

Research On The Grid Density Peak Clustering Algorithm

Posted on:2020-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:X G LiFull Text:PDF
GTID:2438330596471162Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays,the rapid development of the information age makes data more and more important in all fields of society.The subsequent data analysis,data mining and data application also occupy a large proportion in various fields.As an important research method in data mining,clustering is widely used in social networks,text analysis,recommendation systems and transaction fraud detection.Density peak clustering(DPC)is a new clustering algorithm proposed by Alex Rodriguez et al.in 2014.The algorithm can find any shape cluster and find the number of correct number of clusters,which is widely used in image recognition and biological information.However,it is found that its running time on the data set with large data volume has increased,and the corresponding memory consumption has also increased.In response to above shortcoings,the paper proposes a density peak clustering algorithm that using the meshing method based on grid data center on input data set,and the divided mesh objects are clustered on the basis of the DPC algorithm.The algorithm not only significantly reduces in runtime and the memory consumption during calculation the use of mesh objects for clustering.Spectral clustering is the most suitable clustering algorithm for more complex manifold datasets.However,the spectral clustering algorithm requires artificially setting the number of cluster classes and the inefficiency of clustering large-scale datasets.For the above problems,the paper proposes a spectral clustering algorithm based on grid density peak optimization.The algorithm combines the idea of meshing,and uses the geodesic distance to calculate the similarity of the mesh objects.The mesh object is divided to determine the initial cluster center and the number of clusters based on the DPC algorithm.The cluster number is used as the initial parameter of the spectral clustering algorithm to complete the clustering of the mesh object and all data points.On the one hand,the algorithm optimizes the disadvantage that the spectral clustering algorithm cannot adaptively determine the number of clusters;on the other hand,it reduces the time and memory consumption when constructing the feature space.Both theoretical analysis and extensive experimental result show that the method has good validity and accuracy in multiple real data sets.
Keywords/Search Tags:Data mining, clustering, density peak clustering, spectral clustering, grid data center, geodetic distance
PDF Full Text Request
Related items