Font Size: a A A

Similarity and indexing in multidimensional spaces

Posted on:2005-06-28Degree:Ph.DType:Dissertation
University:University of California, RiversideCandidate:Vlachos, Michail TFull Text:PDF
GTID:1458390011952536Subject:Computer Science
Abstract/Summary:
In this dissertation we investigate various techniques for analysis, summarization (online and offline), indexing, clustering and classification, as well as similarity search of multidimensional data. Multi-attribute, or multidimensional data can be collected in many diverse fields, such as environmental, epidemiological and medical applications. The focus of this work is on multidimensional time-series, however the majority of the proposed methods, are directly applicable for most multidimensional datasets. In the context of this research study, the methods that are presented constitute new or improved methods for time-series analysis, and tackle important problems such as burst discovery, periodicity detection, similarity search, stream summarization etc.;The focal point of this research, the multidimensional time-series, are prevalent nowadays in many applications, such as spatiotemporal tracking from sensor networks, video surveillance data, motion capture video etc. For the fast and efficient analysis and exploration of this data, there is an imperative need to provide fast and robust similarity models, that can accurately capture flexible similarities even under the presence of noise. While some of similarity models that are used can be expensive to compute, we are able to provide tight lower and upper on the distance estimation, and we prove that no false dismissals are introduced. The major contributions of this work are: (1) We present the tightest known lower bound for the one-dimensional Euclidean distance and Dynamic Time Warping distance between time-series (given the same disk space for sequence approximation). (2) We propose a flexible index structure for multidimensional sequences that can accommodate multiple distance measures, such as the Euclidean, Time Warping and Longest Common Subsequence without any reconstruction or modification. (3) We present new rotation invariant similarity measures for multidimensional time-series. (4) We propose techniques for time-series stream summarization, that utilize a user parameterized forgetful factor. (5) We show the utility of non-linear dimensionality reduction techniques for improving the accuracy of classification schemes. (6) Finally, we demonstrate the usefulness of our methods in a variety of applications, such as handwriting recognition, motion capture, GPS tracking etc. We verify empirically the generality of our methods, by testing them on more than 60 real world datasets.
Keywords/Search Tags:Multidimensional, Similarity, Methods, Data
Related items