Mining co-location patterns from large spatial datasets

Posted on:2004-11-25

Degree:Ph.D

Type:Thesis

University:University of Minnesota

Candidate:Huang, Yan

Full Text:PDF

GTID:2468390011475964

Subject:Computer Science

Abstract/Summary:

Spatial data mining is the process of discovering implicit, non-trivial, and potentially useful information from large spatial datasets. It encompasses a wide range of techniques for computing and analyzing geographic data. This thesis deals with a task of spatial data mining, namely co-location pattern discovery.; Given a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together in geographic space. For example, symbiotic plant species and predator-prey animal species are likely co-locations in Ecology datasets. The co-location rule discovery problem is different from the association rule discovery problem. Even though the boolean spatial features may be considered as item types, there is no natural notion of transactions. Transactioning spatial datasets can lead to incorrect estimation of the interest measures for many spatial co-location patterns with instances near transaction boundaries. This makes it difficult to use traditional interestingness measures, e.g. support, and traditional association rule mining algorithms, which are based on ideas like support based pruning and compression of transaction data.; The first part of this thesis formalizes the notion of co-locations using user-specified spatial neighborhoods in place of transactions. It defines new interest measures based on the neighborhoods along with a model for interpreting the co-location rules. It provides a correct and complete algorithm for mining co-location rules.; The second part of the thesis focuses on reducing computational costs by exploring alternative join strategies and new filtering methods. An experimental study has been conducted to evaluate the new methods.; The use of prevalence-based pruning to gain computational efficiency makes it difficult to discover high-confidence, low-prevalence (HCLP) co-location rules, which are of interest in many application domains. The third part of the thesis focuses on the development of interest measures and algorithms to mine HCLP co-location rules in a computationally efficient manner.; Lastly, co-location rules suffer from a well-known difficulty associated with unsupervised learning methods in the area of ascertaining validity of inferences. A comparison of interest measures for co-location patterns with traditional spatial statistical measures, e.g., cross K function, is made to provide an independent assessment of quality of patterns.

Keywords/Search Tags:

Spatial, Co-location, Mining, Data, Patterns, Measures

Related items

1	Field - Driven Space - Efficient Co - Location Pattern Mining
2	Mining Spatiotemporal Sub-prevalent Co-location Patterns Based On Star Model
3	Ontology-Based Approaches For Interactively Mining Co-location Patterns
4	Research On Algorithm Of Positive/Negative Co-Location Pattern Mining In Spatial Data
5	The Design Of Prevalent Spatial Co-location Patterns Compression Method
6	Ambiguity Within The Scope Of Threshold Co - The Location Of Fuzzy Object Pattern Mining
7	Mining Spatial Rough Set Theory Co-Location Mode
8	Coupling Co-location Patterns On Spatial Data Sets And Its Mining Methods
9	Mining Local Tight Spatial Sub-prevalent Co-location Pattern
10	Interactive Spatial Co-location Pattern Mining Based On Machine Learning