Font Size: a A A

Research Of Clustering Algorithm Based On Density Peak

Posted on:2020-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:H JinFull Text:PDF
GTID:2428330578464121Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rise and rapid development of the Internet and information technology,the data shows the characteristics of data source diversification and data immensity.How to mine data on large-scale data and quickly obtain effective information has become the focus of recent research.This paper focuses on the density peak clustering algorithm(DPC),which establishes the decision graph and quickly finds the cluster center from the graph to realize the clustering of data.Although the density peak clustering algorithm is efficient and easy to implement on data clustering,it also has some problems.When applying the DPC algorithm,the truncation distance parameter needs to be set in advance,and the current setting method relies on manual setting.Incorrect selection of the cutoff distance dc will result in the wrong selection of the initial cluster center,and the DPC algorithm cannot make corrections in the subsequent allocation process.In addition,in some cases,even if the appropriate dc value is set,it is still difficult to select the initial clustering center from the decision graph,which will have a bad influence on the clustering quality.Further,the DPC algorithm has some defects in the analysis and processing of high-dimensional data,because of the sparsity and spatial complexity of high-dimensional data itself,because the Euclidean distance generally used by the algorithm cannot accurately and reasonably reflect the similarity between the data points,resulting in a poor clustering effect.The DPC algorithm also has certain limitations on the recognition of noise,and it is often impossible to accurately identify the noise points of the data set.The limitations of the DPC algorithm affect the popularization and application of the algorithm,so it is of great significance to improve the DPC algorithm.The research results of this paper are mainly reflected in the following aspects:(1)Aiming at the problem that DPC algorithm is susceptible to human intervention and sensitive to parameter setting,the wrong truncation distance will cause large deviation in the initial cluster center.Even if the correct truncation distance is set,it is still difficult to accurately select the initial cluster centers from the decision graph.Aiming at this limitation,an optimited density peak clustering algorithm by adaptive aggregation strategy is proposed.The algorithm first calculates the local density of the data points based on the nearest K neighbors,then compares it with the initial threshold to select the initial cluster center,then classifies the remaining points into the clusters where the initial cluster center is closest to it.Finally,a new merging strategy is proposed,which combines the initial clusters by the concept of density between clusters.The results of experiment show that the proposed algorithm is superior to DPC,DBSCAN,KNNDPC and KMEANS in the synthesis and UCI datasets,which can effectively improve the clustering accuracy and quality.(2)Aiming at the problem that clustering algorithm based on density peak has parameter sensitivity,poor processing of aspheric data and complex manifold data clustering,and noise limitation,an optimited density peak clustering algorithm by natural nearest neighbors is proposed.The algorithm first determines the local density of the data points according to the concept of natural nearest neighbors,then determines the cluster center according to thatdensity peaks have highest local density and are divided by sparse regions,and then classifies the remaining points to the center of the initial cluster center closest to it.Finally,a concept of similarity between clusters is proposed to solve complex manifold problems.On the issue of the limitation of noise points,the threshold is set by the characteristics of the natural nearest neighbor.The results of experiment show that the proposed algorithm is superior to DPC,DBSCAN,KNNDPC and KMEANS algorithms in the synthesis and UCI datasets,and has higher robustness.It also shows strong superiority on aspherical data and complex manifold data.(3)Combining the advantages of the algorithm proposed in third chapter and fourth chapter of this paper,this paper proposes a new algorithm which calculates local density according to the natural nearest neighbor,obtains initial cluster centers and merges similar clusters according to the adaptive aggregation strategy.First,this algorithm verifies the effectiveness of the algorithm on the UCI datasets and then applies this algorithm to student information analysis.The results of experiment show that the analysis results of this algorithm can effectively guide the education department to achieve different educational effects according to different learning conditions and basic physical qualities of students,and teach students in accordance with their aptitude.
Keywords/Search Tags:data mining, clustering algorithm, density peak, merging strategy, natural nearest neighbor
PDF Full Text Request
Related items