Font Size: a A A

Spatio-temporal data mining for environmental science

Posted on:2011-08-02Degree:Ph.DType:Thesis
University:University of MinnesotaCandidate:Kang, James MFull Text:PDF
GTID:2468390011471378Subject:Engineering
Abstract/Summary:
Spatio-temporal (ST) data mining is concerned with modeling and mining interesting, useful, non-trivial patterns from massive datasets representing geographic phenomena such as in environmental sciences. However, discovering spatio-temporal patterns in environmental science datasets is computationally challenging for the following reasons: (1) Subsets of patterns may not follow an optimal sub-structure, (2) There may be a large search space due to the increase in combinatorics of related events, and (3) Answers may need to be provided in real-time based on a time-constraint.;To begin to tackle these challenges, we studied the problem of flow anomalies to tackle the challenge non-optimal sub-structure. In general, the flow anomaly problem can be defined as follows: given a pair of sensors on a flow network, where each sensor has a continuous set of measurements of some variable, flow anomaly discovery aims to identify periods of time that are significantly mis-matched based on some threshold. However, identifying flow anomalies is computationally hard because a single flow anomaly pattern may contain subsets that exhibit normal behavior (e.g., gaps within an oil spill), which violates one of the main principles in dynamic programming of requiring optimal sub-structure. We have proposed a SWEET (Smart Window Enumeration and Evaluation of persistent-Thresholds) approach that uses a novel statistical algebraic interest measure where only a few counters are needed to discover all the flow anomalies within the dataset. In addition, we proposed a further optimized version called SWEET-ER that reduces the complexity to linear time when identifying the dominant flow anomalies. Analytical analysis was conducted as well as experimental evaluation on synthetic and real datasets.;To address the challenge of combinatorics in ST datasets, we studied the problem of teleconnected flow anomalies within a network. Teleconnected flow anomalies may occur in river networks where a contaminant may enter and then exit. Discovering teleconnections in flow networks is computational challenging due to the large number of combinatorics of flow anomalies that may occur in ST datasets. We proposed a spatio-temporal dynamic-neighborhood model to represent the changing neighborhoods. Based on this model, we proposed a RAD (Relationship Analysis of Dynamic-neighborhoods) approach to discover teleconnected flow anomalies within a network that utilizes our spatio-temporal dynamic neighborhood model to efficiently identify the patterns. Experimental analysis conducted on both synthetic and real datasets showed RAD performed three times faster than naive alternatives.;The challenge of real-time analysis in ST datasets is addressed by exploring the problem of continuous reverse nearest neighbors in a moving object (monochromatic and bichromatic) environment. Identifying RNNs is computationally challenging since all the queries and objects may be constantly moving, objects may be inserted and/or deleted over time, and the query answer must be retrieved based on a time-constraint (i.e., to approximately mimic real-time scenarios). Thus, we proposed an IGERN (Incremental and General Evaluation of Reverse Nearest neighbors) approach that efficiently identifies continuous RNN queries for both mono- and bi-chromatic cases. In addition, we proposed a filter-and-refine approach (fr-IGERN) for the bichromatic case which further reduces the search space to identify all the reverse nearest neighbors. Experimental evaluation of both synthetic and real datasets showed that IGERN performed three times faster than naive alternatives.;The proposed work in this thesis brings together an initial step toward understanding the immense challenges and novel applications of ST datasets. In this thesis, we have formally modeled ST datasets and have begun to explore new and exciting types of patterns (e.g., flow anomaly) that may have many uses in real-world applications. We conclude this thesis by exploring additional types of patterns, its computational challenges, and possible directions for future work.
Keywords/Search Tags:Spatio-temporal, Patterns, ST datasets, Mining, Flow anomalies, Reverse nearest neighbors, Environmental
Related items