Font Size: a A A

Grid Clustering Algorithm

Posted on:2007-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Z ZhangFull Text:PDF
GTID:2208360185971918Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining techniques can be used to find out potential and useful knowledge from the vast amount of data, and it plays a new significant role to the stored data in the info-times. With the rapid development of the data mining techniques, the technique of grid clustering, as important parts of data mining, are widely applied to the fields such as pattern recognition, data analysis, image processing, and market research. Research on grid clustering algorithms has become a highly active topic in the data mining research.In this thesis, the author presents the theory of data mining, and deeply analyzes the algorithms of grid clustering. Based on the analysis of traditional grid clustering algorithms, we advance grid-based border points extraction algorithm (GBD) that can enhance the precision of grid clustering by the technique of border points extraction; Based on that the grid clustering algorithm is sensitive to parameters, we advance a grid-based clustering algorithm with the parameter automatization (PAG) that can solve the problem that the grid clustering algorithm is sensitive to parameters; Based on the analysis of traditional algorithms for multi-density, we advance a grid-based clustering algorithm for multi-density (GDD). The GDD is a kind of the multi-stage clustering that integrates grid-based clustering, the technique of density threshold descending and border points extraction.In this thesis, we have developed GBD, PAG, GDD and SNN algorithm and implemented it using Visual C++ 6.0. We conducted a series of experiments, including the experiment of the correctness of grid clustering, the experiment on synthetic datasets, the experiment on the real dataset and the experiment on the dataset with even density.As shown in the experimental results, GBD algorithm can deal with border points properly and improve the precision of clustering result; PAG algorithm can solve the problem that the grid clustering algorithm is sensitive to parameters; GDD algorithm can not only clusters correctly but find outliers in the dataset, and it effectively solves the problem that traditional grid algorithms can cluster only or find outliers only. The precision of GDD algorithm is better than that of SNN. The GDD algorithm works well for even density dataset and lots of multi-density datasets; it can...
Keywords/Search Tags:grid clustering, density threshold descending, multi-stage clustering, border points extraction, parameter automatization
PDF Full Text Request
Related items