Density Peak Clustering Analysis Based On Similarity And Its Application

Posted on:2022-01-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:2518306521994979

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Clustering analysis is one of the main components of data mining.Its task is to use a similarity relationship within the data to divide the dataset into multiple unrelated groups.It is widely used in agriculture,astronomy,industry,medicine and physics.Density peak clustering algorithm DPC has the advantages of identifying the cluster center,fast cluster allocation and simple implementation.However,it has some shortcomings,such as artificially setting cut-off metric parameters,inappropriate cluster allocation strategy,and unable to identify non spherical data sets.The above problems reduce the clustering effect.In this thesis,the problems of density calculation and cluster allocation are studied.The main research results are as follows:(1)A clustering algorithm of density peak based on similarity is proposed.In this algorithm,firstly,the density is calculated by k-nearest neighbor,and the neighborhood density is obtained by the ratio of its nearest neighbor density and distance.The density calculation method adapted to data distribution is given,which effectively solves the randomness of DPC cut-off distance selection.Secondly,using shared nearest neighbor,reverse nearest neighbor and density ratio,the similarity calculation method based on nearest neighbor is given,and then k-nearest neighbor is replaced by most similar nearest neighbor.Finally,a cluster expansion method based on distance and density is proposed by using the most similar nearest neighbor data object.Experiments on UCI datasets show the effectiveness of the algorithm.(2)A density peak clustering algorithm based on spark is proposed.Firstly,the algorithm divides the data evenly in the spark cluster.Then,the shared nearest neighbor,density and similar nearest neighbor are cached in memory,and multiple nodes are used to participate in the calculation.Finally,the similarity is used to expand the cluster.Utilizing the characteristics of neighborhood density value,similarity expansion cluster method,spark's superior system architecture and memory operation,the distinguish degree of clusters and the efficiency of algorithm are improved.Using artificial datasets,experiments verify the algorithm has good extensibility.(3)Based on the above results,using python language,the astronomical spectrum clustering system is developed.The running results show that the system provides an effective way for spectral data analysis.

Keywords/Search Tags:

Clustering, Density peak, Similarity, Astronomical spectrum

PDF Full Text Request

Related items

1	Research On Density Peaks Clustering
2	Manifold Density Peak Clustering Algorithm And Its Application Of Weibo Text Classification
3	Research And Improvement On Density Peak Clustering Algorithm And Application For Earthquake Classification
4	Research And Application Of Clustering Algorithm Based On Density Peak
5	Research On Application And Optimization Of Density Peak Clustering
6	Research On Improved Density Peak Clustering Algorithm
7	Research On Density Peak-based Clustering Algorithm And Its Parallel Implementation
8	Research And Application Of Density Peak Clustering Algorithm Based On Density Decay Graph
9	Research And Application Of Density Peak Clustering Algorithm
10	The Research And Application Of Density Peak Clustering Algorithm