Font Size: a A A

Cluster Research On Spatial Data

Posted on:2008-09-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z W SunFull Text:PDF
GTID:1118360245491013Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Facing a large-scale, high-dimensional data and all kinds of contrains, how to build effective and scalable data mining clustering algorithms is one of hot research directions of data mining. Aiming at above issues, some clustering algorithms have been studied substantially as follows:Based on the analysis of density-based algorithms and grid-based algorithms, three algorithms are proposed, which are CluGD, GDRS and VCluGD. The CluGD algorithm firstly gets representative points, and then clusters the representative points through density method. Here the representative point is not the actual data points, but virtual data points reflecting the data space. Although this algorithm adopts the same parameters as DBSCAN algorithm, it greatly improved the efficiency because of using grid method. The GDRS algorithm employs random sample method to manage the representative points. As single parameter can't accurately reflect the internal characteristics of the data space because of a lot changes in the density of large-scale data, the VCluGD algorithm extends the CluGD algorithm. The VCluGD algorithm gets a relationship graph between density and the number of points by using a pretreatment process based on the neighborhood radius. This algorithm is convenient for users to set up multi-level parameters, and has better effect of clustering. The executing efficiencies of these three algorithms are linear time for the size of data sets and they are all suitable for large-scale clustering.Through studies and analysis the strengths and weaknesses of the clustering algorithms which can manage constrains of non-spatial attributes, the DBSCAN+ algorithm is proposed based on DBSCAN, and then the paper proposes to adopt the SOM algorithm for auxiliary managing high-dimensional non-spatial attributes. According to the data types of non-spatical attributes, DBSCAN+ algorithm calculates the dissimilarity of diffirent data types, and then the experiment results are shown. Auxiliary method is that firstly using the SOM algorithm to choose the proper dimension for the aim of clustering, then the DBSCAN+ algorithm clusters based on these candidate dimensions, or the SOM algorithm directly clusters these candidate dimensions, and then the cluster results of this two cluster algorithms are combined. The experiment results show that the mehod is effective.In view of the shortcomings of the existing cluster algorithms on spatial constraints, DBOF is proposed to deal with the spatial contrains. In this algorithm, the spatial contrains are marked as obstacle, facility, both obstacle and facility. Polygon model is adopted to deal with the obstacle, and graphical structure is used to manage the facility, and for the objects of the third, the especial graphical structure with attributes of traversing points is used to express it. The complete obstacle distance is used to measure the distance between two obstacles, and graphical structures are used to model the other two constrains, so it is benefical to the practical application of the DBOF algorithm. The experiments show that the DBOF clustering algorithm can get better results and have high efficiency.
Keywords/Search Tags:Data Mining, Cluster, Constrain, Self-Organizing Map, Grid-based algorithm, Density-based algorithm
PDF Full Text Request
Related items