Font Size: a A A

Mining co-location patterns from large spatial datasets

Posted on:2004-11-25Degree:Ph.DType:Thesis
University:University of MinnesotaCandidate:Huang, YanFull Text:PDF
GTID:2468390011475964Subject:Computer Science
Abstract/Summary:
Spatial data mining is the process of discovering implicit, non-trivial, and potentially useful information from large spatial datasets. It encompasses a wide range of techniques for computing and analyzing geographic data. This thesis deals with a task of spatial data mining, namely co-location pattern discovery.; Given a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together in geographic space. For example, symbiotic plant species and predator-prey animal species are likely co-locations in Ecology datasets. The co-location rule discovery problem is different from the association rule discovery problem. Even though the boolean spatial features may be considered as item types, there is no natural notion of transactions. Transactioning spatial datasets can lead to incorrect estimation of the interest measures for many spatial co-location patterns with instances near transaction boundaries. This makes it difficult to use traditional interestingness measures, e.g. support, and traditional association rule mining algorithms, which are based on ideas like support based pruning and compression of transaction data.; The first part of this thesis formalizes the notion of co-locations using user-specified spatial neighborhoods in place of transactions. It defines new interest measures based on the neighborhoods along with a model for interpreting the co-location rules. It provides a correct and complete algorithm for mining co-location rules.; The second part of the thesis focuses on reducing computational costs by exploring alternative join strategies and new filtering methods. An experimental study has been conducted to evaluate the new methods.; The use of prevalence-based pruning to gain computational efficiency makes it difficult to discover high-confidence, low-prevalence (HCLP) co-location rules, which are of interest in many application domains. The third part of the thesis focuses on the development of interest measures and algorithms to mine HCLP co-location rules in a computationally efficient manner.; Lastly, co-location rules suffer from a well-known difficulty associated with unsupervised learning methods in the area of ascertaining validity of inferences. A comparison of interest measures for co-location patterns with traditional spatial statistical measures, e.g., cross K function, is made to provide an independent assessment of quality of patterns.
Keywords/Search Tags:Spatial, Co-location, Mining, Data, Patterns, Measures
Related items