Font Size: a A A

Grid-based Clustering Algorithm Analysis And Research

Posted on:2008-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:M J LiuFull Text:PDF
GTID:2208360215460478Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining techniques is the new technique that develops in recent years, which can be used to find out potential and useful knowledge from the vast amount of data, and it provides the powerful support to carry on various business decision in science ground. With the rapid development of the data mining techniques, the technique of grid clustering, as important parts of data mining, are widely applied to the fields such as pattern recognition, data analysis, image processing, and market research. Research on grid clustering algorithms has become a highly active topic in the data mining research.In the first part, we introduce the related background of data mining and its theories knowledge. Then we briefly summarize the related work of clustering analysis. Based on the analysis of traditional grid clustering algorithms, we bright forward the grid-based shared nearest neighbor clustering algorithm (GNN). The basic procedure is that divide the spatial database into many grids, then map all data into each grid. When querying, we can just consider parts of data in related grids. So it can accelerate the operating speed. The GNN algorithm removes some outliers or noises in the dataset by the technique of grid and disposes of density threshold of grid by the method of density threshold. The GNN algorithm clusters by the method of shared nearest neighbor and improves the efficiency by the use of the grid center. Aim at the measurement method on similitude between objects, we put forward a similarity-based grid clustering algorithm (SGCA). It applies on the grid clustering and disposes of border points of clusters by the method of the threshold function of border points that enhances remarkably the precision of grid clustering. In order to improve the efficiency of SGCA, the technique of grid cores-based is used.In this thesis, we have developed GNN, SGCA, SNN and CLIQUE algorithm and implemented them using Visual C++ 6.0. We conducted a series of experiments, including the experiment of the correctness of grid clustering and the efficiency of it. The GNN and SGCA algorithm have both better expansibility and can discover clusters of arbitrary shapes. They are not only suitable for some synthetic datasets, but also it has better clustering results in some high dimensional datasets.As shown in the experimental results, GNN algorithm can solve the problem that the grid clustering algorithm is sensitive to parameters by density threshold of grid and can improve the efficiency by the use of the grid center; SGCA algorithm removes some outliers or noises in the dataset and deals with border points properly and improves the precision of clustering result. In order to improve the efficiency of SGCA, the technique of grid cores-based is used in this paper.To sum up, the GNN algorithm can not only cluster correctly but also find outliers in the dataset and is not sensitive to noises and order of data input. The precision of GNN algorithm is better than that of SNN. The SGCA is not only suitable for some synthetic datasets, but also it has better clustering results in some high dimensional datasets.
Keywords/Search Tags:Data Mining, grid, clustering, shared nearest neighbor, density threshold, center, similarity, threshold function, cores
PDF Full Text Request
Related items