Font Size: a A A

Research On The Grid-based And Density-based Clustering Algorithms

Posted on:2008-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhangFull Text:PDF
GTID:2178360242467330Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data clustering is a process of finding similarity between data objects and gathering them into clusters according to a certain criteria. It's aim is to make sure that the intra-cluster data have higher similarity and inter-cluster data have higher dissimilarity. Clustering is an essential way to understand the world around us, and it's an important task in the data mining field. Existing clustering algorithms fundamentally include partition method, hierarchical method, density-base method, grid-based method, field topology method, etc. Grid-based method divides the data space into non-overlapping cells, and cluster these cells into clusters instead of the original data points. This kind of method gains high time efficiency, while the cluster quality is not satisfied, especially at the cluster boundaries. Density-based clustering method defines clusters as the regions with a high population, and separates the low populated regions as noises. This kind of method can gain high clustering quality, but the time efficiency is very high.To solve this problem, a new method(GDC4P) is proposed in this thesis that breaks the cells located at the boundaries of the clusters and processes these data points again for a better precision, which adopted the spirits of density-based and grid-based methods. Rather than the regular existing algorithms, the proposed algorithm calculates the information of each cell and consider it as the cell's property, clusters those core cells who are neighbored, retrieves the suspected cells and reloads the original information of the data points contained in the suspected cells that are located at the boundaries of the clusters. This method has a good clustering precision at the clusters' boundaries.Experimental evaluation has shown that this method is more efficient than CLIQUE and had the time complexity within O(n). Besides, their is a tremendous improvement of time efficiency to DBSCAN. This grid and density based hybrid method gives an inspiration of improving existing algorithms in a new way, and the idea of multi-level process for precision has a great significance in theory and practice.
Keywords/Search Tags:Data Clustering, Grid-based, Density-based, Precise Cluster Boundaries
PDF Full Text Request
Related items