Study On Grid-based Clustering Algorithms | | Posted on:2011-02-07 | Degree:Master | Type:Thesis | | Country:China | Candidate:H Zhao | Full Text:PDF | | GTID:2178360308965523 | Subject:Management Science and Engineering | | Abstract/Summary: | PDF Full Text Request | | As the information technologies have been used widely in many fields of human life, the data gained from these fields grows exponentially every day. How to extract knowledge from these data is a problem that should be solved. Data mining techniques is the new technique that develops in recent years, which can be used to find out potential and useful knowledge from the vast amount of data, and it provides the powerful support to carry on various business decisions in science ground. Clustering analysis is a main technique of data mining. It partitions data objects into clusters that are composed by similar objects. Research on gird-based clustering algorithms has become a highly active topic in the clustering analysis.In the first part, we introduce the related background of data mining and its theories knowledge. Then we briefly summarize the related work of clustering analysis. Based on the analysis of traditional clustering algorithms, we makes special research for gird-based clustering algorithm and makes the comparatively analysis's of traditional and improved gird-based clustering algorithm. We make analysis of the parameters of the gird-based clustering algorithm effect and the advantages and disadvantages of gird-based clustering algorithm.The study is conducted and improved on the clustering algorithm SGRIDS (a Scalable GRID-based clustering algorithm for very large Spatial databases) which deals with a large spatial databases. A new grid-based data compression framework is introduced in SGRIDS. In this framework, data are compressed only when they belong to the cluster. SGRIDS produces accurate clusters through one scan over a data set. Considering that the input parameter has a great impact on the quality of clustering algorithms, this article improves the settlement of the value for density threshold, decreases the impact of input parameter. The improved algorithm can find clusters with arbitrary shapes, and it is not affected by the input order of data. The experimental shows the improved algorithm attain a better clustering effect.It was fund that counting data points is commonly used to compute the density of cell in most clustering method based on grid. The method lost a lot of influence which is made by data points to where the data points reside. The loss of influence makes it more probably to assign data points to different clusters even if the data points were loser than others. To overcome the shortcoming, the concept of Contribution in CGDP (hybridization of Clustering based on Grid and Density with Particle swarm optimization) is introduced and a novel method is presented to computing the density of grid cell based on the idea of influence function. The Contribution is the influence gained by the cell nearby the data point which makes the influence. Then the Contribution is the grid density and dispose of density threshold of grid by the method of density threshold. The results of experiment suggest that the new method using Contribution can reduce the loss of the influence. The improved Grid-based clustering algorithm can remove outliers or noises in the dataset. | | Keywords/Search Tags: | clustering analysis, grid, grid-based data compression, density threshold, contribution | PDF Full Text Request | Related items |
| |
|