Font Size: a A A

Research On Fast Search Density Peak Clustering Algorithm Based On Streaming Computing

Posted on:2019-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:P F WangFull Text:PDF
GTID:2438330572959544Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity of a variety of intelligent terminals and sensing devices,a large number of streaming data are generated,and these flow data are of tremendous value.However,the characteristics such as diversification,time series,mass and continuity of flow data lead to the difficulty of mining useful information from it.Clustering analysis is an unsupervised way of learning in data mining,which can classify the data by the similarity between the data without the prior knowledge of the data and facilitate the analysis of the data and find out the potential value of the streaming data.In this paper,we propose two points that can be improved about the CFSFDP(Clustering by Fast Search and Find of Density Peaks)algorithm,which is proposed by Alex and Alessandro in 2014.Firstly,the selection of the clustering center is according to the decision graph,which is drawn by the local density and distance of the data points,through manual selection.An improved method to automatically select the cluster centers by introducing the concept of cluster center weight and using the idea of anomaly detection is proposed in this paper.The cluster centers are regarded as the abnormal points in the data set.The cluster centers of the data sets are automatically calculated through anomaly detection.It is proved that the result of selecting clustering center automatically by the method proposed in this paper is basically consistent with the result of selecting by using decision diagram to manually determine the cluster center by experiments.Secondly,in order to reduce the impact of noise on the analysis results,the CFSFDP algorithm divide clusters into cluster core and cluster halo.The noise points are classified into cluster halo.However,the classification results are not accurate enough.In this paper,we propose a concept called the local density of clusters and redesign the standard to determine a data point in the cluster whether to be included in the cluster halo,which makes it more accurate in determining whether a data point in the cluster belongs to the cluster core or the cluster halo.Finally,in order to make the CFSFDP algorithm applicable to the scenario of streaming data,this paper implements the optimized CFSFDP algorithm for Spark Streaming platform.And experiments were performed to test and analyse the accuracy,running speed,speedup and expansion ratio of the optimized clustering algorithm.
Keywords/Search Tags:CFSFDP, Clustering, Spark Streaming, Stream computing
PDF Full Text Request
Related items