Font Size: a A A

Study On Similarity Query Over Time Series Data

Posted on:2008-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Q ZuoFull Text:PDF
GTID:2178360242994050Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Time series is a sequence of values in which each value corresponds to a time point. It has been a ubiquitous data appeared in many real world applications, e.g. stock prices, population numbers, temperature data, customers shopping sequences, and multi-dimensional moving object trajectories, etc. It is important to analyze the data for discovering potentially valuable knowledge. These studies can help us to discover the evolvement and correlations between different things, and thus have important significance for scientific supporting decision-making etc.This thesis studies the problem of similarity query over time series data, which is an essential technique for database and data mining applications. Specifically, the problems of symbolization of time series, similarity measure model and effective similarity measure of sequences are addressed. Main contributions of this thesis are as follows:(1) Accurate symbolization of time seriesA local segmentation based symbolizing approach is proposed for solving the inaccurate problem brought by the typical segmentation based on sliding window. Experimental results justified the superiority of the approach over the previous one. And, the similarity measure method for the symbolized representation is proposed. Hierarchical clustering with the approach proposed can get higher accuracy than other methods.(2) Hierarchical model for measuring similarity of time seriesThe idea that points in the same hierarchy can be compared is proposed for similarly measure on time series. Based on this idea, a hierarchical similarity model is proposed, and two practical methods are implemented by utilizing the Fast Fourier Transform (FFT). For speeding up the search process, an efficient filtering method is provided. The experimental results of k-NN query and clustering show the superiority of our approach over the competitors. And the tests of time performance and powers of the filtering methods proposed demonstrate its better efficiency, and thus better feasibility in the real applications. (3) Effective editing similarity measure on symbolic sequencesData dependence issue is explored for improving effectiveness of similarity measure with edit distance for symbolic sequences. An effective definition of editing similarity is given, which quantifies the effects of operated and surrounding data on estimating the content difference. The information-theoretic explanation is provided to justify the reasonability of the proposed technique. For speeding up the searching process, a lower-bounding method is introduced. Empirical studies indicate that significant improvements on the effectiveness of similarity measure are achieved by the proposed approach.
Keywords/Search Tags:time series, similairy query, similariy measure, edit distance
PDF Full Text Request
Related items