Research And Implementation Of Density Peaks Clustering Algorithm

Posted on:2019-12-07

Degree:Master

Type:Thesis

Country:China

Candidate:W P Sun

Full Text:PDF

GTID:2428330548981383

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid spread of the Internet in the world,we are faced with massive amounts of data from society,commerce,medicine,engineering and science,and every aspect of our daily lives every day.The explosive growth,widespread availability,and huge scale of data have brought us into a real data era.However,how to quickly and easily mine valuable information from these disorganized large-scale data and transform these unorganized data into knowledge has become a hot research topic in the field of modern science.Clustering by fast search and find of density peaks(FSDP)is a new type of density clustering algorithm published by Rodriguez et al.in 2014 in the journal Science.Because of its advantages of simple principle,easy implementation and rapid discovery of arbitrary shape clusters,since the algorithm was proposed,a large number of research scholars have studied and applied this algorithm.The advantages of FSDP algorithm are outstanding.However,its shortcomings are also evident.FSDP algorithm mainly has the following deficiencies:(1)The value of truncation distance dc is difficult to determine,mainly relying on subjective experience,lack of certain selection basis;(2)The selection of cluster centers requires human participation,and the objectivity and accuracy of clustering results cannot be guaranteed;(3)When calculating the local density and minimum distance of data objects,it is necessary to traverse all data objects in the data set,resulting in the time complexity of the algorithm is too high,and it is not suitable for the cluster analysis of large-scale data sets.In view of the above-mentioned problems in FSDP clustering algorithm,this paper proposed the corresponding improvements:For the difficulty of determining the value of the truncation distance dc and the selection of the cluster center requires human participation in the FSDP algorithm,a clustering algorithm combining the cuckoo search algorithm and the density peak clustering algorithm is proposed.First,the cuckoo search algorithm is used to obtain the proper truncation distance dc for the FSDP through the predefined local density information entropy fitness function,and the local density and the minimum distance of the data object in the data set are obtained through this dc.Then,using cuckoo search algorithm to find a pair of appropriate local density and minimum distance threshold for the FSDP in the local density and minimum distance space of the data set through the predefined Rand fitness function(Here,in order to speed up the search speed of this pair of threshold,an improved cuckoo search algorithm is proposed to replace the original cuckoo search algorithm to perform search operations for the shortcomings of the original cuckoo search algorithm with slow convergence speed and low search accuracy).By comparing the local density and minimum distance of the data object in the dataset with this pair of threshold,the data object with local density and minimum distance larger than the threshold is selected as clustering center to perform clustering.Experiments show that the improved clustering algorithm not only can automatically select the correct clustering center without artificial participation,but also can achieve better clustering results.For the large-scale datasets clustering analysis of the FSDP algorithm,due to the high time complexity of the algorithm,the efficiency of the algorithm is too low,a parallel FSDP clustering algorithm SFSDP based on Spark is proposed,and applied to the detection of urban hotspots.The practicality of this algorithm is verified by the effective detection of urban hotspots.firstly,the algorithm divides the dataset to be clustered into multiple data partitions with relatively uniform by spatial meshing;then,an improved FSDP clustering algorithm is used to perform cluster analysis on each data partition in parallel;finally,the global clustering result is generated by aggregating local clustering results.Experimental results show that SFSDP algorithm can effectively perform cluster analysis of large-scale data sets compared with FSDP clustering algorithm,and the algorithm has a good performance in terms of accuracy and scalability.

Keywords/Search Tags:

Clustering, Cuckoo search, Density peak, Data mining, Spark

PDF Full Text Request

Related items

1	Research And Application Of Density Peak Clustering Algorithm Based On Spark Framework
2	Research On Density Peak-based Clustering Algorithm And Its Parallel Implementation
3	Research And Application Of Clustering By Fast Search And Find Of Density Peaks
4	Research On Path-based Of Clustering By Fast Search And Find Of Density Peak
5	Research On The Grid Density Peak Clustering Algorithm
6	Research Of Clustering Algorithm Based On Density Peak
7	Research On Hierarchical Clustering Algorithm Based On Density Peaks
8	Research And Application Of Fast Density Peak Clustering Algorithm
9	Research On Density Peak Clustering And Its Application In Community Detection
10	Research On Fast Search Density Peak Clustering Algorithm Based On Streaming Computing