Font Size: a A A

Research And Implementation Of Density Peaks Clustering Algorithm

Posted on:2019-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:W P SunFull Text:PDF
GTID:2428330548981383Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid spread of the Internet in the world,we are faced with massive amounts of data from society,commerce,medicine,engineering and science,and every aspect of our daily lives every day.The explosive growth,widespread availability,and huge scale of data have brought us into a real data era.However,how to quickly and easily mine valuable information from these disorganized large-scale data and transform these unorganized data into knowledge has become a hot research topic in the field of modern science.Clustering by fast search and find of density peaks(FSDP)is a new type of density clustering algorithm published by Rodriguez et al.in 2014 in the journal Science.Because of its advantages of simple principle,easy implementation and rapid discovery of arbitrary shape clusters,since the algorithm was proposed,a large number of research scholars have studied and applied this algorithm.The advantages of FSDP algorithm are outstanding.However,its shortcomings are also evident.FSDP algorithm mainly has the following deficiencies:(1)The value of truncation distance dc is difficult to determine,mainly relying on subjective experience,lack of certain selection basis;(2)The selection of cluster centers requires human participation,and the objectivity and accuracy of clustering results cannot be guaranteed;(3)When calculating the local density and minimum distance of data objects,it is necessary to traverse all data objects in the data set,resulting in the time complexity of the algorithm is too high,and it is not suitable for the cluster analysis of large-scale data sets.In view of the above-mentioned problems in FSDP clustering algorithm,this paper proposed the corresponding improvements:For the difficulty of determining the value of the truncation distance dc and the selection of the cluster center requires human participation in the FSDP algorithm,a clustering algorithm combining the cuckoo search algorithm and the density peak clustering algorithm is proposed.First,the cuckoo search algorithm is used to obtain the proper truncation distance dc for the FSDP through the predefined local density information entropy fitness function,and the local density and the minimum distance of the data object in the data set are obtained through this dc.Then,using cuckoo search algorithm to find a pair of appropriate local density and minimum distance threshold for the FSDP in the local density and minimum distance space of the data set through the predefined Rand fitness function(Here,in order to speed up the search speed of this pair of threshold,an improved cuckoo search algorithm is proposed to replace the original cuckoo search algorithm to perform search operations for the shortcomings of the original cuckoo search algorithm with slow convergence speed and low search accuracy).By comparing the local density and minimum distance of the data object in the dataset with this pair of threshold,the data object with local density and minimum distance larger than the threshold is selected as clustering center to perform clustering.Experiments show that the improved clustering algorithm not only can automatically select the correct clustering center without artificial participation,but also can achieve better clustering results.For the large-scale datasets clustering analysis of the FSDP algorithm,due to the high time complexity of the algorithm,the efficiency of the algorithm is too low,a parallel FSDP clustering algorithm SFSDP based on Spark is proposed,and applied to the detection of urban hotspots.The practicality of this algorithm is verified by the effective detection of urban hotspots.firstly,the algorithm divides the dataset to be clustered into multiple data partitions with relatively uniform by spatial meshing;then,an improved FSDP clustering algorithm is used to perform cluster analysis on each data partition in parallel;finally,the global clustering result is generated by aggregating local clustering results.Experimental results show that SFSDP algorithm can effectively perform cluster analysis of large-scale data sets compared with FSDP clustering algorithm,and the algorithm has a good performance in terms of accuracy and scalability.
Keywords/Search Tags:Clustering, Cuckoo search, Density peak, Data mining, Spark
PDF Full Text Request
Related items