Font Size: a A A

Efficient similarity search in time series data

Posted on:2008-03-07Degree:Ph.DType:Dissertation
University:The Chinese University of Hong Kong (Hong Kong)Candidate:Zhou, MiFull Text:PDF
GTID:1448390005956544Subject:Computer Science
Abstract/Summary:
Time series data is ubiquitous in real world, and the similarity search in time series data is of great importance to many applications. This problem consists of two major parts: how to define the similarity between time series and how to search for similar time series efficiently. As for the similarity measure, the Euclidean distance is a good starting point; however, it also has several limitations. First, it is sensitive to the shifting and scaling transformations. Under a geometric model, we analyze this problem extensively and propose an angle-based similarity measure which is invariant to the shifting and scaling transformations. We then extend the conical index to support for the proposed angle-based similarity measure efficiently. Besides the distortions in amplitude axis, the Euclidean distance is also sensitive to the distortion in time axis; Dynamic Time Warping (DTW) distance is a very good similarity measure which is invariant to the time distortion. However, the time complexity of DTW is high which inhibits its application on large datasets. The index method under DTW distance is a common solution for this problem, and the lower-bound technique plays an important role in the indexing of DTW. We explain the existing lower-bound functions under a unified frame work and propose a group of new lower-bound functions which are much better. Based on the proposed lower-bound functions, an efficient index structure under DTW distance is implemented. In spite of the great success of DTW, it is not very suitable for the time scaling search problem where the time distortion is too large. We modify the traditional DTW distance and propose the Segment-wise Time Warping (STW) distance to adapt to the time scaling search problem. Finally, we devise an efficient search algorithm for the problem of online pattern detection in data streams under DTW distance.
Keywords/Search Tags:Time, Search, Similarity, DTW distance, Data, Efficient, Problem
Related items