Study On Similarity Query Over Time Series Data

Posted on:2008-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:X Q Zuo

Full Text:PDF

GTID:2178360242994050

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Time series is a sequence of values in which each value corresponds to a time point. It has been a ubiquitous data appeared in many real world applications, e.g. stock prices, population numbers, temperature data, customers shopping sequences, and multi-dimensional moving object trajectories, etc. It is important to analyze the data for discovering potentially valuable knowledge. These studies can help us to discover the evolvement and correlations between different things, and thus have important significance for scientific supporting decision-making etc.This thesis studies the problem of similarity query over time series data, which is an essential technique for database and data mining applications. Specifically, the problems of symbolization of time series, similarity measure model and effective similarity measure of sequences are addressed. Main contributions of this thesis are as follows:(1) Accurate symbolization of time seriesA local segmentation based symbolizing approach is proposed for solving the inaccurate problem brought by the typical segmentation based on sliding window. Experimental results justified the superiority of the approach over the previous one. And, the similarity measure method for the symbolized representation is proposed. Hierarchical clustering with the approach proposed can get higher accuracy than other methods.(2) Hierarchical model for measuring similarity of time seriesThe idea that points in the same hierarchy can be compared is proposed for similarly measure on time series. Based on this idea, a hierarchical similarity model is proposed, and two practical methods are implemented by utilizing the Fast Fourier Transform (FFT). For speeding up the search process, an efficient filtering method is provided. The experimental results of k-NN query and clustering show the superiority of our approach over the competitors. And the tests of time performance and powers of the filtering methods proposed demonstrate its better efficiency, and thus better feasibility in the real applications. (3) Effective editing similarity measure on symbolic sequencesData dependence issue is explored for improving effectiveness of similarity measure with edit distance for symbolic sequences. An effective definition of editing similarity is given, which quantifies the effects of operated and surrounding data on estimating the content difference. The information-theoretic explanation is provided to justify the reasonability of the proposed technique. For speeding up the searching process, a lower-bounding method is introduced. Empirical studies indicate that significant improvements on the effectiveness of similarity measure are achieved by the proposed approach.

Keywords/Search Tags:

time series, similairy query, similariy measure, edit distance

PDF Full Text Request

Related items

1	Research On The Similarity-Based Time Series Data Mining
2	Time Series Similarity Search Based On Adaptive Cost Dynamic Time Warping Distance
3	Query Processing Techniques Based On Time Series Analysis
4	Research On Feature Representation And Similarity Measure Methods In Time Series Data Mining
5	Research On Periodic Time Series Clustering Analysis And Forecasting Method Based On Density Measure
6	A Similarity Measure And Application Research For RSS Time Series
7	Approximate String Matching And Optimizition Techniques Using Edit Distance
8	Parallel Query Optimization For InfluxDB Time Series Database
9	Research On Uncertain Time Series Similarity Matching
10	Research On Keyword Spotting Based On DMLS