Research On The Grid-based And Density-based Clustering Algorithms

Posted on:2008-05-14

Degree:Master

Type:Thesis

Country:China

Candidate:N Zhang

Full Text:PDF

GTID:2178360242467330

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Data clustering is a process of finding similarity between data objects and gathering them into clusters according to a certain criteria. It's aim is to make sure that the intra-cluster data have higher similarity and inter-cluster data have higher dissimilarity. Clustering is an essential way to understand the world around us, and it's an important task in the data mining field. Existing clustering algorithms fundamentally include partition method, hierarchical method, density-base method, grid-based method, field topology method, etc. Grid-based method divides the data space into non-overlapping cells, and cluster these cells into clusters instead of the original data points. This kind of method gains high time efficiency, while the cluster quality is not satisfied, especially at the cluster boundaries. Density-based clustering method defines clusters as the regions with a high population, and separates the low populated regions as noises. This kind of method can gain high clustering quality, but the time efficiency is very high.To solve this problem, a new method(GDC4P) is proposed in this thesis that breaks the cells located at the boundaries of the clusters and processes these data points again for a better precision, which adopted the spirits of density-based and grid-based methods. Rather than the regular existing algorithms, the proposed algorithm calculates the information of each cell and consider it as the cell's property, clusters those core cells who are neighbored, retrieves the suspected cells and reloads the original information of the data points contained in the suspected cells that are located at the boundaries of the clusters. This method has a good clustering precision at the clusters' boundaries.Experimental evaluation has shown that this method is more efficient than CLIQUE and had the time complexity within O(n). Besides, their is a tremendous improvement of time efficiency to DBSCAN. This grid and density based hybrid method gives an inspiration of improving existing algorithms in a new way, and the idea of multi-level process for precision has a great significance in theory and practice.

Keywords/Search Tags:

Data Clustering, Grid-based, Density-based, Precise Cluster Boundaries

PDF Full Text Request

Related items

1	Research On Performance Optimization And Parameter Selection Of Density Clustering Algorithm
2	Grid-based Density Clustering Algorithm
3	The Study Of Clustering Algorithm Based On Density
4	Research On Data Clustering Based On Grid
5	The Research Of Grid-based Parallel Clustering Algorithm And Clustering For Data Stream
6	Research On An Effective Self Adapted Grid-Density Based Clustering
7	Density-based And Grid-baed Uncertain Data Stream Clustering Algorithm In Vulnerability Detection
8	Research On Clustering Method Of Datastream Based On Grid And Density
9	Research On Improvement Of Clustering Algorithm Based On Density Peaks
10	Research On Data Stream Clustering Algorithm Based On Density Grid