Font Size: a A A

The Research And Application Of Density Peaks Clustering

Posted on:2018-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:T LiFull Text:PDF
GTID:2348330518986505Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an important method of data analysis,clustering analysis has been widely studied in data mining,machine learning pattern recognition and other research fields.With the continuous development of information technology such as data mining and the emergence of massive data,cluster analysis has also been rapidly developed.In 2014,Alex Rodriguez et al put forward a new clustering algorithm based on density: Clustering by fast search and find of density peaks,referred to as the Density Peaks Clustering(DPC).The DPC algorithm is based on the ideas that the clustering center is surrounded by neighbors with relatively low local density,and has a relatively large distance from other data points with higher density.Firstly,the algorithm selects density peaks,namely clustering centers,by decision graph.Secondly,each remaining point is assigned to the same cluster as its nearest neighbor of higher density,and then noises are identified automatically based on boundary region density.The DPC algorithm is novel,simple and efficient,which mainly has the following advantages: there is no need to predefine the number of clusters;it can find nonspherical clusters and identify noises;data point assignment strategy with high performance is beneficial to process large-scale data.However,it also has some shortcomings: how to set the optimum parameters by better method instead of the artificial experience;for some data sets with complex structure,the decision graph is sometimes difficult to accurately select the right cluster centers;data points assignment strategy still has some defects,etc.These shortcomings will undoubtedly limit the popularization and application of DPC algorithm.Therefore,it is of great significance to improve the algorithm and expand its application field.The main work and research results of this paper are as follows:(1)Aiming at the defect that the DPC algorithm is difficult to select the cluster center accurately by the decision graph,this paper proposes a new algorithm: Density peaks clustering by automatic determination of cluster centers.This algorithm can automatically determine the cluster centers by sorting graph rather than decision map: firstly,the algorithm automatically finds the inflection point and determines the potential clustering centers according to the sorting graph;then the algorithm automatically determines the actual cluster centers from potential cluster centers;finally,the algorithm assigns each remaining data point by the same strategy as DPC algorithm.Theoretical analysis and experimental results show that the new algorithm can not only determine cluster centers automatically,but also has better clustering results.(2)Aiming at the defect that DPC algorithm based on Euclidean distance can not effectively deal with the data sets with complex structure,this paper proposes a new algorithm: Density peaks clustering based on density adaptive distance.Firstly,calculating density adaptive distance which can better describe the distribution of data structure based on the Euclidean distance and adaptive similarity,including local density adaptive distance and global density adaptive distance.Secondly,combining density adaptive distance and DPC algorithm to get the new algorithm.Theoretical analysis and experimental results show that the new algorithm can not only deal with complex structure data sets,but also has better clustering results.(3)Aiming at the shortcomings of the DPC algorithm in the measurement of sample local density and data points assignment strategy,as well as it is very sensitive to decision graph,this paper puts forward a density peak clustering algorithm based on K nearest neighbor.Firstly the algorithm calculates each sample's local density and assigns all the remaining samples based on their K nearest neighbor information.And then it combines the initial sub-clusters which are obtained based on decision graph and finally the cluster results are obtained.Theoretical analysis and experimental results show that the new algorithm not only does not reduce the performance of the algorithm,but also can better deal with the complex structure data sets.(4)Finally,the above three improved algorithms and DPC algorithm are applied to image clustering.Firstly,image feature data is extracted according to the characteristics of different image databases,and then clustering the feature data by using different clustering algorithms.A comparative analysis of the effect of different clustering algorithms proves the validity and superiority of algorithms proposed by this paper.
Keywords/Search Tags:density peaks clustering, cluster center, density adaptive distance, K nearest neighbor, image clustering
PDF Full Text Request
Related items