Cluster Research On Spatial Data

Posted on:2008-09-14

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z W Sun

Full Text:PDF

GTID:1118360245491013

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Facing a large-scale, high-dimensional data and all kinds of contrains, how to build effective and scalable data mining clustering algorithms is one of hot research directions of data mining. Aiming at above issues, some clustering algorithms have been studied substantially as follows:Based on the analysis of density-based algorithms and grid-based algorithms, three algorithms are proposed, which are CluGD, GDRS and VCluGD. The CluGD algorithm firstly gets representative points, and then clusters the representative points through density method. Here the representative point is not the actual data points, but virtual data points reflecting the data space. Although this algorithm adopts the same parameters as DBSCAN algorithm, it greatly improved the efficiency because of using grid method. The GDRS algorithm employs random sample method to manage the representative points. As single parameter can't accurately reflect the internal characteristics of the data space because of a lot changes in the density of large-scale data, the VCluGD algorithm extends the CluGD algorithm. The VCluGD algorithm gets a relationship graph between density and the number of points by using a pretreatment process based on the neighborhood radius. This algorithm is convenient for users to set up multi-level parameters, and has better effect of clustering. The executing efficiencies of these three algorithms are linear time for the size of data sets and they are all suitable for large-scale clustering.Through studies and analysis the strengths and weaknesses of the clustering algorithms which can manage constrains of non-spatial attributes, the DBSCAN+ algorithm is proposed based on DBSCAN, and then the paper proposes to adopt the SOM algorithm for auxiliary managing high-dimensional non-spatial attributes. According to the data types of non-spatical attributes, DBSCAN+ algorithm calculates the dissimilarity of diffirent data types, and then the experiment results are shown. Auxiliary method is that firstly using the SOM algorithm to choose the proper dimension for the aim of clustering, then the DBSCAN+ algorithm clusters based on these candidate dimensions, or the SOM algorithm directly clusters these candidate dimensions, and then the cluster results of this two cluster algorithms are combined. The experiment results show that the mehod is effective.In view of the shortcomings of the existing cluster algorithms on spatial constraints, DBOF is proposed to deal with the spatial contrains. In this algorithm, the spatial contrains are marked as obstacle, facility, both obstacle and facility. Polygon model is adopted to deal with the obstacle, and graphical structure is used to manage the facility, and for the objects of the third, the especial graphical structure with attributes of traversing points is used to express it. The complete obstacle distance is used to measure the distance between two obstacles, and graphical structures are used to model the other two constrains, so it is benefical to the practical application of the DBOF algorithm. The experiments show that the DBOF clustering algorithm can get better results and have high efficiency.

Keywords/Search Tags:

Data Mining, Cluster, Constrain, Self-Organizing Map, Grid-based algorithm, Density-based algorithm

PDF Full Text Request

Related items

1	The Research And Application Of Data Mining Based On Grid-Density
2	Grid-based Density Clustering Algorithm
3	Large-scale Scientific Data Mining Density Clustering Algorithm
4	Research On Data Stream Clustering Algorithm Based On Double-layer Grid And Density
5	Research On Performance Optimization And Parameter Selection Of Density Clustering Algorithm
6	The Study Of Clustering Algorithm Based On Density
7	A Clustering Algorithm Based On Density With Its Application In The Customer Cluster In The Field Of Telecom
8	Research On Data Stream Clustering Algorithm Based On Density Grid
9	Application Of Grid And Density Based Clustering Algorithm In Data Mining
10	An Algorithm Based On Density And Grid For Mining And Clustering Association Rules