Font Size: a A A

Research And Application Of Clustering By Fast Search And Find Of Density Peaks

Posted on:2020-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:C LvFull Text:PDF
GTID:2428330599976460Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet,people have produced massive amounts of data in all walks of life,which makes people slowly enter a truly new era of data age.Therefore,in order to promoting the development of fields in industry,commerce,transportation and medical,and contributing to the progress of human society,it has become an important contemporary research topic in the field of data mining to extract the potential value from the stored data.Cluster analysis is one of the important algorithms in the field of data mining.It belongs to the unsupervised learning module in machine learning.In 2014,Alex Rodriguez and Alessandro Laio have published a paper named Clustering by fast search and find of density peaks(CFSFDP)in Science,which has taken a different approach in the field of clustering algorithms,not only overcoming the shortcoming that distance-based clustering algorithms can only find the "circular-like" data sets,but also finding clusters of arbitrary shapes,and what's more,it is not sensitive to noise points.Some improvements is proposed in this paper mainly in the following aspects:(1)Firstly,when determining the clustering center for CFSFDP algorithm,it is always limited by subjective factors,which makes the cluster lack scientificity and accuracy.A density peak clustering algorithm based on positive sequence iterative selection strategy is proposed in this article.At first,for the case where the distribution of variables in the decision function is not uniform,the normalization process is performed,so that the two parameters of ?(local density)and ?(distance)in the decision function are evenly distributed.Then,a positive-order iterative selection strategy is proposed when determining the cluster center for searching the "elbow point" according to the trend of the number of cluster core points,and completing the clustering by using the points before the "elbow point" as the cluster centers.The experimental results show that the proposed algorithm can cluster data sets of arbitrary distribution without increasing the time complexity,which effectively improves the adaptability and effect of the cluster algorithm.(2)Secondly,due to the calculation of the decision function parameters ?(local density)and ?(distance),the algorithm needs to traverse the entire data set when the data set is clustered by the CFSFDP algorithm,which will lead to time complexity of the algorithm is too high.Based on the density peak clustering algorithm by using the positive sequence iterative selection strategy,a parallel density peak clustering algorithm based on Spark is proposed.The algorithm firstly divides the cluster data set into sections,so that the data of each interval is traversed in this interval,and the calculation of ?(local density)and ?(distance)is completed independently in each interval,and the density peak clustering algorithm of the iterative selection strategy is used for clustering at each node.Finally,the clustered results are re-clustered,and the clustering of the whole data set is completed.The experimental results show that the Spark-based parallel density peak clustering algorithm has a better effect on the calculation amount and time for the density peak clustering algorithm based on the positive order iterative selection strategy,which greatly improves the computational efficiency of clustering.(3)In order to testing the validity and practicability of the algorithm,this paper tests the take-out data,and applies the parallel density peak clustering algorithm based on Spark.Firstly,the latitude and longitude of each take-away seller is clustered,and then the time points of each ordering.Through the experimental results and analysis,the algorithm provides a powerful marketing basis for the business strategy of the stores and the managements of the stores,and also reflects the practical value of the algorithm.Finally,the whole paper is summarized and further research direction of the algorithm is introduced.
Keywords/Search Tags:clustering algorithm, density peak, positive sequence iteration, parallel computing, data mining
PDF Full Text Request
Related items