Font Size: a A A

Research Of Key Techniques On Spatial Data Mining Based On Spatial Autocorrelation

Posted on:2008-02-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:C P HuFull Text:PDF
GTID:1118360272476798Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The quick development of computer technologies, network technologies, spatial data collection technologies and spatial databases technologies make spatial data more complex, more changeable and bigger, which has been beyond the human ability to analyze, so the demand of discovering knowledge from spatial databases is strengthened increasingly and a new research field in order to discover knowledge from spatial databases has appeared——spatial data mining. Spatial data mining refers to the extraction from spatial databases of implicit knowledge, spatial relations or significative features or patterns that are not explicitly stored in spatial databases. It is a new area that integrates multi-subject and many technologies, which combines the technologies of data mining, machine learning, pattern recognition, spatial databases, statistics, artificial intelligence, geographic information system, remote sensing and decision support system and so on.This paper firstly introduces the basic theory of spatial data mining systemically, compares the differences between the traditional data mining and spatial data mining. Due to the characteristic of spatial data, the traditional data mining technologies are unfit for mining knowledge from spatial databases. In order to mine novelty, effective and understandable knowledge from spatial databases, new theories, technologies and methods must be studied. The research of this dissertation focuses on spatial clustering, spatial co-location rules and spatial classification and prediction.The main contributions of this paper can be included as follows:Firstly, an improved density-based spatial clustering algorithm with sampling (IDBSCAS) based on DBSCAN is proposed, which not only clusters large-scale spatial databases effectively, but also considers spatial attributes and non-spatial attributes. Firstly, because this algorithm adopts a new sampling technique, it needn't execute region query for all objects in a purity-core object's neighborhood, saving a lot of clustering time. In addition, it considers not only spatial attributes but also non-spatial attributes by introducing the concept of the matching neighborhood, which improves the clustering quality. Experimental results of 2-D spatial datasets show that IDBSCAS is better than DBSCAN on the efficiency and the quality of clustering. Secondly, although there have some researching on spatial co-location rules mining, but mostly researchers discuss only positive spatial co-location rules, don't consider negative spatial co-location rules. A novel positive and negative spatial co-location rules mining algorithm(PNSCLRMA) is proposed, which mines not only positive spatial co-location rules but also negative spatial co-location rules. To reduce the computational cost, the algorithm uses two optimization techniques of adopting star neighborhoods to reduce join operations and defining the interesting degree to delete uninteresting spatial co-location patterns. Experimental results show that the algorithm can efficiently mines positive and negative spatial co-location rules.Thirdly, a new spatial prediction model(MLR*) based on the multivariate linear regression (MLR) model is proposed, which spatial information is firstly added into inputting variables by replacing each inputting variables with the weighted average of its neighbors and feed the new inputting variables to a MLR model to estimate model parameters, and then make spatial prediction. Experimental results show that the MLR* model and the spatial auto-regression(SAR) model have almost identical effects on spatial prediction, while the MLR* model is computationally more efficient than the SAR model.Finally, the spatial classification and prediction algorithm based on fuzzy c-means(SFCM) is proposed by introducing the concept of fuzzy membership degree of a spatial object to a fuzzy cluster. Firstly, this algorithm clusters the dataset by fuzzy c-means. Due to spatial autocorrelation of spatial data, spatial information must be added into the fuzzy c-means algorithm for spatial clustering. Secondly, it computers the fuzzy membership degree of each spatial object to all fuzzy clusters and finds the cluster that its fuzzy membership degree is the maximal. At last, the dependent variable value of the spatial object is estimated by the dependent variable value of the mean object of this cluster. Theoretic analysis and experimental results show that the algorithm outperforms the SAR model and the CPFCM method on the classification and prediction accuracy, and is faster than the SAR model.
Keywords/Search Tags:spatial data mining, spatial autocorrelation, spatial clustering, spatial co-location rules, spatial classification and prediction
PDF Full Text Request
Related items