Font Size: a A A

A framework for temporal and spatial data mining and its applications

Posted on:2008-03-13Degree:Ph.DType:Thesis
University:Temple UniversityCandidate:Wang, QiangFull Text:PDF
GTID:2448390005969747Subject:Computer Science
Abstract/Summary:
Efficiently and accurately searching for similarities and discovering interesting patterns in very large temporal and spatial databases is an important but non-trivial data mining problem with applications in many domains, including economics, medical imaging, biology, meteorology and astrophysics. Several intrinsic characteristics of temporal and spatial data differentiate them from traditional data: ambiguity of higher-order features, temporal (spatial) autocorrelations, interestingness of patterns, etc. In this thesis, we address fundamental issues arising in time series analysis. Particularly, we propose an array of techniques that can be applied to different stages of time series analysis, from dimensionality reduction and data representation to similarity measure and frequent pattern discovery. Since spatial data can be converted into time series through certain transformation, these techniques can also be utilized for spatial data analysis.; Due to the high dimensionality of time series and large number of instances stored in time series databases, it is usually not feasible to perform indexing and similarity searching among time series directly. A common practice is to perform a preprocessing step of dimensionality reduction and make use of spatial access methods such as R-trees for indexing and quick search. Different from most existing techniques, which have been proposed to lower bound the distance measure (typically the Euclidean distance) in the original space in order to guarantee no false dismissals, techniques proposed in this thesis focus on answering queries faster and providing more accurate representations. Given a dataset, a codebook is created with the technique of Vector Quantization. The time series are then encoded with a new symbolic representation of much lower dimensionality. After dimensionality reduction, the time series can be indexed and retrieved in a very efficient way. Three distance measures are introduced and can be utilized under different circumstances: RoughDist is appropriate for approximate but very fast searches with possible false dismissals; LB RoughDist guarantees no false dismissals but is more computationally expensive; Histogram Model distance is the choice when Euclidean distance does not perform well. A hierarchical mechanism is also introduced to collect information from different resolution levels. Besides improved search accuracy, the multi-resolution representation provides a good way to summarize the time series and helps finding frequent patterns. In order to tackle the problems we encounter with time series of different lengths, we introduce an algorithm that transforms the matching problem into a shortest path problem in a directed acyclic graph. This algorithm can automatically find the best matching part in the target for a query and eliminate the effects of existing noise.; For spatial data, our analysis is focused on Regions of Interest (ROIs). The ROIs are transformed into time series with a locality preserving space filling curve. The spatial data can then be analyzed using the techniques we introduce for time series analysis. In addition to developing techniques for detecting associations among spatial and non-spatial data we evaluate the efficiency of different association mining techniques for spatial data. For this purpose we extend a spatial simulator that can be used to model spatial regions and associations between spatial and non-spatial predicates. We obtain measures of recovery of known associations as a function of various parameters, including sample number, association strength, number and type of association, association degree, prior probabilities of spatial predicates and spatial normalization error.; The techniques proposed in this thesis have general applicability to temporal and spatial data in various formats and application domains. In particular, we have tested the framework in the analysis of different medical image datasets, including fMRI...
Keywords/Search Tags:Data, Spatial, Time series, Different, Mining, Techniques
Related items