Research And Application On Distributed Clustering And Incremental Clustering Based On DBSCAN

Posted on:2017-06-10

Degree:Master

Type:Thesis

Country:China

Candidate:L Q Tian

Full Text:PDF

GTID:2348330503492872

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet and information technology, Present in people's daily life and production data showed a massive growth. How to dig out the underlying message from the mass of data it is essential for guiding people's production and life, and clustering is an important foundation of data mining. Although a lot of experts and scholars on massive data clustering in-depth studies and made a lot of achievements, but how to improve the efficiency and the cluster of massive data accuracy is still the focus of current research.Based on the above introduction, this paper mainly does work as follows:(1) Studying the traditional DBSCAN algorithm, for severe memory consumption and less sensitive to parameters, a distributed clustering algorithm based on kernel density estimation and DBSCAN. First, this algorithm use policy of "divide and rule" to deal with the massive data, the data set is distributed evenly on each node. Then, Using kernel density estimation method on each node to confirm the two parameters Eps and Min Pts and based on parameters to cluster. Last, merging every node's results. The experimental results show, the clustering algorithm not only improves efficiency, while improving the quality of clustering.(2) For the current clustering algorithm efficiency in incremental data clustering process, an incremental clustering algorithm based DBSCAN is proposed. For growing data sets, only new data clustering algorithm using DBSCAN, then in accordance with the density of up rules to merge the results among existing data, thus avoiding the clustering of new data when the existing data must be "second cluster", improving greatly the efficiency of clustering incremental data.(3) On the basis of distributed algorithms and incremental DBSCAN algorithm, combine distributed framework Storm, realization of the network data clustering system. Firstly, the system collects data from the original network sites in each network device. Secondly, filter these raw network data and pre-cleaning, and converted the traffic data to a standard data format is used for cluster. Finally, these massive network of distributed data clustering and incremental clustering, clustering results generated. The system has completed a one-stop processing network traffic data clustering, and realized the massive data network stable and efficient Cluster Analysis.

Keywords/Search Tags:

DBSCAN, Distributed clustering, Incremental clustering, Kernel Density Estimation, Storm

PDF Full Text Request

Related items

1	Research On Adaptive Clustering Algorithm Based On DBSCAN Theory
2	Study Of Distributed Real-time Data Flow Density Clustering Algorithm Based On Storm
3	Research On Dynamic Clustering And Incremental In Data Mining
4	Study On Clustering For Large Data Sets And Its Applications
5	Research On Density Clustering Algorithm Based On DBSCAN For Personalized Clustering
6	The Research On Web Text Clustering Based On DBSCAN Optimized Algorithm
7	Theory And Practice Of Ant Clustering And Partitioning-based DBSCAN Clustering
8	The Research And Improvement Of Density-based Clustering Algorithm
9	Research On DBSCAN Algorithm Based On Grid And Density-ratio
10	Research And Application Of Clustering Algorithm Based On DBSCAN