Font Size: a A A

Grid-based Density Clustering Algorithm

Posted on:2007-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:X L QiuFull Text:PDF
GTID:2208360185975902Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
After developed and carried out for many years, Data Mining has absorbed many latest research results from the different subjects and has become a new research embranchment. Cluster algorithm is one of the most flourish directions of Data Mining. It has been applied abroad at other scientific fields. The typical clustering algorithms include K-Means, CLARANS, BIRCH, CURE, DBSCAN and CLIQUE. DBSCAN is one of the most popular algorithms in cluster analysis. It can discover any clusters with arbitrary shape and separate noise. But this algorithm doesn't do any precondition before clustering, so when we use it to cluster large databases or high dimensional datum, we will waste too much memory and have the high IO overhead. Furthermore, this algorithm doesn't select Minpts of points according to the actual situation. It just uses the global Minpts of points as Minpts simply, so that the clustering result is inaccurate. In order to fix this problem, I bring forward GDBSCAN algorithm in this thesis. The new algorithm cannot only solve these issues, but also can get over the defect that datum must be dense when we use the grid clustering.In the first part, I introduced the background of the cluster. Then I analyzed the disadvantage of DBSCAN and brought forward the new Grid-Based clustering algorithm—GDBSCAN. The basic procedure is that divide the spatial database into many grids, then map all datum into each grids. When querying, we can just consider parts of datum in the related grids, so it not only can avoid frequent inputting and outputting operation, but also can accelerate the operating speed. Moreover, in GDBSCAN, I created an extra variable to indicate the datum that have been clustered, so that when I do the recursion clustering, I can take advantage of received information to avoid clustering these datum. When I cluster the high dimensional data, according to the monotonic lemma, I can waive the low dimensional un-clustered datum. Then the operating speed is further accelerated. Besides, in order to select the appropriate Minpts, 1 gave a new method in this thesis.Finally, this paper developed a platform with VC and tested this algorithm from three aspects. Visualization is another research field, so I just actualized the...
Keywords/Search Tags:Data mining, Cluster, Density-Based clustering, Grid, Region Query
PDF Full Text Request
Related items