Font Size: a A A

Study On Grid-Based Clustering Algorithms

Posted on:2007-03-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y F SunFull Text:PDF
GTID:1118360242461923Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the information technologies have been used widely in many fields of human life, the data gained from these fields grows exponentially every day. How to extract knowledge from these data is a problem that should be solved. The aim of data mining is to extract knowledge from large data sets. Clustering analysis is a main technique of data mining. It partitions data objects into clusters that are composed by similar objects.Grid has been used in spatial data analysis, index, and clustering. Grid-based data analysis methods partition a data space into rectangle cells, and analysis is performed on these cells. A data space may be partitioned in many ways. Thereinto, hierarchical partition and p×p partition are mostly used.Grid can compress data by substituting the data in each cell by the statistics of these data. Cells, micro-clusters, and antibodies are similar in data compression, but they have many different characters. And the operations in clustering algorithms that use these compression methods are different.Grid-based algorithms can be parallelized easily because the data in a grid can be partitioned in many ways.Traditional clustering algorithms assume that all data points in a cell belong to the same cluster. But this assumption is not always reasonable. So a new grid-based data compression framework is proposed. In this framework, data are compressed only when they belong to the same cluster. In a grid, the data points in a cell that is totally inside a cluster must belong to that cluster. Based on the observations of the relationships between cells and clusters in a space, the new data compression framework partitions a space unevenly. The cells totally inside a cluster have coarse granularity, and the cells on the border of clusters have fine granularity.A clustering algorithm SGRIDS is designed based on this new data compression framework. The algorithm judges the position of a cell based on the data density in this cell. SGRIDS produces accurate clusters through one scan over a data set. Unlike traditional grid-based clustering algorithms, SGRIDS is not sensitive to the granularity of cells. The computational complexity of SGRIDS is O ( N ). The algorithm can find clusters with arbitrary shapes, and it is not affected by the input order of data. The experiments show that SGRIDS outperforms the classic clustering algorithm WaveCluster.The research on data stream mining is very hot in these years. Clustering algorithms for data streams should fulfill many new requirements. To satisfy these requirements, stream clustering algorithms normally compress stream data online into a summary data structure. A clustering algorithm called GCHDS is proposed to deal with high-dimensional data streams. This algorithm uses grid as the summary data structure. It chooses useful dimensions to perform clustering based on the distribution of data on each dimension. The performance study on a real data set demonstrates the efficiency and effectiveness of GCHDS. It outperforms HPStream that is proposed in a VLDB paper.GSCDS is a subspace clustering algorithm that is proposed to deal with high-dimensional data streams. This algorithm combines the bottom-up grid partition method and top-down grid partition method. An evenly partitioned grid is used to compress stream data online. A top-down partition method is used to separate the clusters in this grid, and each cluster is associated with a subspace. Then the algorithm labels connected cells in each subspace as clusters. Lastly, the algorithm eliminates the errors induced by the top-down partition. The complexity analysis and performance study with real data sets and a synthetic data set demonstrate the efficiency and effectiveness of GSCDS.
Keywords/Search Tags:clustering, grid method, data compression, data stream, subspace clustering
PDF Full Text Request
Related items