Font Size: a A A

The Research On Spatial Data Clustering Based On Space Partition

Posted on:2011-01-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:M HuangFull Text:PDF
GTID:1228330332982980Subject:Photogrammetry and Remote Sensing
Abstract/Summary:PDF Full Text Request
Spatial clustering methods are important components of spatial data mining technology. It is a process of aggregating spatial dataset into clusters according to certain rules and methods. It has been widely applied to spatial database analysis. As an important spatial data mining technique, it has been widely used in applications such as GIS, remote sensing image processing and so forth. Also, image recognition, pattern acquisition, and statistical trend analysis also use spatial clustering methods.When dealing with large-scale spatial datasets, traditional spatial clustering methods cannot load dataset into memory at one time. Under the neighboring search mode, which uses the distances between data object for calculation, the algorithm needs to scan data object on the disk repeatedly and thus incur large amount of I/O overhead, which will severely affect the efficiency. In order to effectively improve the efficiency of the clustering algorithm, proper method should be adopted to reduce the quantity of spatial datasets and achieve correct clustering result at lowest price for I/O. Data partition is an effective approach. Using data partitioning, the algorithm firstly partitions the dataset and then read each part into memory to perform clustering, the efficiency can be greatly enhanced. Because data partition is done for the purpose of loading dataset into memory, the clustering algorithm which uses data partition need to determine the number of partitions, sizes and locations, and also need to combine the partition result accurately.Grid partition is a structure that uses the dividing lines parallel to data axis to divide the data space. Grid structure can divide the data space into multiple child data space, either evenly or unevenly, and can easily characterize these sub-data space respect to the size and location. By using such structure, partition clustering can be achieved more convenient and straightforward. In this thesis, a grid-based dynamic data partition method for large spatial dataset clustering is proposed. According to density-based spatial clustering method, the algorithm is designed and implemented, in which a one way neighborhood search method based on principle of equal in data’s neighborhood is proposed and used. The algorithm divide the data space based on grid partition method, and achieve dynamic data partition based clustering by reading unit cell by stages. Experiments prove that such algorithm can deal with large data set spatial data clustering computation with reliable clustering result and high efficiency.For data sets in high dimensional space and large number of empty grid cell generated by small spatial data grid, low efficiency of the grid unit queries is an unavoidable problem. In order to tackle this problem, a grid search index tree QG-Tree is proposed in this thesis. It is based on the quad tree partitioning technique and designed to improve the search efficiency of grid cell. The index structure partition the grid space into a grid hierarchy tree like a quad tree, which contains the root node, intermediate nodes, leaf nodes and so forth. For the index tree, each node corresponds to the smallest size of non-empty unit of each partition. This approach can effectively reduce the grid storage consumption and at the same time largely improve the search efficiency of grid cell. This thesis gives the definition of QG-tree as well as the design of grid search algorithm.Based on the study of density and grid-based clustering algorithm, which includes the comparison of both their advantages and disadvantages, the thesis proposes a clustering algorithm based on grid-density. The main idea is to obtain dense grid cell via calculation, which defines the dense grid cell as one that contains at least one core clustering object and obtains all the dense grid cells via incomplete neighborhood search method based on density-based clustering. During the query process, it also combines dense grid cells and expend the clusters. Since this method only deals with part of the dataset at a time, the computation consumption can be effectively reduced so as to improve the efficiency of the algorithm. At the same time, the algorithm uses QG-Tree to build a grid index structure, which improves the search efficiency of grid cells. Experiments prove that the algorithm has a high spatial clustering performance.At last, this thesis makes a conclusion of the study and discusses the future direction of following study.
Keywords/Search Tags:Spatial data mining, Spatial clustering, Space partition, Data partition, Spatial indexing
PDF Full Text Request
Related items