Grid-based Density Clustering Algorithm

Posted on:2007-09-21

Degree:Master

Type:Thesis

Country:China

Candidate:X L Qiu

Full Text:PDF

GTID:2208360185975902

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

After developed and carried out for many years, Data Mining has absorbed many latest research results from the different subjects and has become a new research embranchment. Cluster algorithm is one of the most flourish directions of Data Mining. It has been applied abroad at other scientific fields. The typical clustering algorithms include K-Means, CLARANS, BIRCH, CURE, DBSCAN and CLIQUE. DBSCAN is one of the most popular algorithms in cluster analysis. It can discover any clusters with arbitrary shape and separate noise. But this algorithm doesn't do any precondition before clustering, so when we use it to cluster large databases or high dimensional datum, we will waste too much memory and have the high IO overhead. Furthermore, this algorithm doesn't select Minpts of points according to the actual situation. It just uses the global Minpts of points as Minpts simply, so that the clustering result is inaccurate. In order to fix this problem, I bring forward GDBSCAN algorithm in this thesis. The new algorithm cannot only solve these issues, but also can get over the defect that datum must be dense when we use the grid clustering.In the first part, I introduced the background of the cluster. Then I analyzed the disadvantage of DBSCAN and brought forward the new Grid-Based clustering algorithm—GDBSCAN. The basic procedure is that divide the spatial database into many grids, then map all datum into each grids. When querying, we can just consider parts of datum in the related grids, so it not only can avoid frequent inputting and outputting operation, but also can accelerate the operating speed. Moreover, in GDBSCAN, I created an extra variable to indicate the datum that have been clustered, so that when I do the recursion clustering, I can take advantage of received information to avoid clustering these datum. When I cluster the high dimensional data, according to the monotonic lemma, I can waive the low dimensional un-clustered datum. Then the operating speed is further accelerated. Besides, in order to select the appropriate Minpts, 1 gave a new method in this thesis.Finally, this paper developed a platform with VC and tested this algorithm from three aspects. Visualization is another research field, so I just actualized the...

Keywords/Search Tags:

Data mining, Cluster, Density-Based clustering, Grid, Region Query

PDF Full Text Request

Related items

1	The Study Of Clustering Algorithm Based On Density
2	Research On An Effective Self Adapted Grid-Density Based Clustering
3	Research On Performance Optimization And Parameter Selection Of Density Clustering Algorithm
4	Large-scale Scientific Data Mining Density Clustering Algorithm
5	Application Of Grid And Density Based Clustering Algorithm In Data Mining
6	Research On Data Stream Clustering Algorithm Based On Density Grid
7	Research On Data Stream Clustering Algorithm Based On Double-layer Grid And Density
8	Research On The Grid Density Peak Clustering Algorithm
9	Grid Clustering Research Based On Impact Factor Of Grid Density
10	The Study Of The Weighted Average Density Self-adaptive Clustering Algorithm Based On Grid And Its Application