Font Size: a A A

Density Peak Clustering Analysis Based On Similarity And Its Application

Posted on:2022-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2518306521994979Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of the main components of data mining.Its task is to use a similarity relationship within the data to divide the dataset into multiple unrelated groups.It is widely used in agriculture,astronomy,industry,medicine and physics.Density peak clustering algorithm DPC has the advantages of identifying the cluster center,fast cluster allocation and simple implementation.However,it has some shortcomings,such as artificially setting cut-off metric parameters,inappropriate cluster allocation strategy,and unable to identify non spherical data sets.The above problems reduce the clustering effect.In this thesis,the problems of density calculation and cluster allocation are studied.The main research results are as follows:(1)A clustering algorithm of density peak based on similarity is proposed.In this algorithm,firstly,the density is calculated by k-nearest neighbor,and the neighborhood density is obtained by the ratio of its nearest neighbor density and distance.The density calculation method adapted to data distribution is given,which effectively solves the randomness of DPC cut-off distance selection.Secondly,using shared nearest neighbor,reverse nearest neighbor and density ratio,the similarity calculation method based on nearest neighbor is given,and then k-nearest neighbor is replaced by most similar nearest neighbor.Finally,a cluster expansion method based on distance and density is proposed by using the most similar nearest neighbor data object.Experiments on UCI datasets show the effectiveness of the algorithm.(2)A density peak clustering algorithm based on spark is proposed.Firstly,the algorithm divides the data evenly in the spark cluster.Then,the shared nearest neighbor,density and similar nearest neighbor are cached in memory,and multiple nodes are used to participate in the calculation.Finally,the similarity is used to expand the cluster.Utilizing the characteristics of neighborhood density value,similarity expansion cluster method,spark's superior system architecture and memory operation,the distinguish degree of clusters and the efficiency of algorithm are improved.Using artificial datasets,experiments verify the algorithm has good extensibility.(3)Based on the above results,using python language,the astronomical spectrum clustering system is developed.The running results show that the system provides an effective way for spectral data analysis.
Keywords/Search Tags:Clustering, Density peak, Similarity, Astronomical spectrum
PDF Full Text Request
Related items