Font Size: a A A

Research And Application On Distributed Clustering And Incremental Clustering Based On DBSCAN

Posted on:2017-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:L Q TianFull Text:PDF
GTID:2348330503492872Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and information technology, Present in people's daily life and production data showed a massive growth. How to dig out the underlying message from the mass of data it is essential for guiding people's production and life, and clustering is an important foundation of data mining. Although a lot of experts and scholars on massive data clustering in-depth studies and made a lot of achievements, but how to improve the efficiency and the cluster of massive data accuracy is still the focus of current research.Based on the above introduction, this paper mainly does work as follows:(1) Studying the traditional DBSCAN algorithm, for severe memory consumption and less sensitive to parameters, a distributed clustering algorithm based on kernel density estimation and DBSCAN. First, this algorithm use policy of "divide and rule" to deal with the massive data, the data set is distributed evenly on each node. Then, Using kernel density estimation method on each node to confirm the two parameters Eps and Min Pts and based on parameters to cluster. Last, merging every node's results. The experimental results show, the clustering algorithm not only improves efficiency, while improving the quality of clustering.(2) For the current clustering algorithm efficiency in incremental data clustering process, an incremental clustering algorithm based DBSCAN is proposed. For growing data sets, only new data clustering algorithm using DBSCAN, then in accordance with the density of up rules to merge the results among existing data, thus avoiding the clustering of new data when the existing data must be "second cluster", improving greatly the efficiency of clustering incremental data.(3) On the basis of distributed algorithms and incremental DBSCAN algorithm, combine distributed framework Storm, realization of the network data clustering system. Firstly, the system collects data from the original network sites in each network device. Secondly, filter these raw network data and pre-cleaning, and converted the traffic data to a standard data format is used for cluster. Finally, these massive network of distributed data clustering and incremental clustering, clustering results generated. The system has completed a one-stop processing network traffic data clustering, and realized the massive data network stable and efficient Cluster Analysis.
Keywords/Search Tags:DBSCAN, Distributed clustering, Incremental clustering, Kernel Density Estimation, Storm
PDF Full Text Request
Related items