Font Size: a A A

Research On The Similarity-Based Time Series Data Mining

Posted on:2008-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:X M LuFull Text:PDF
GTID:2178360215462604Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
A time series is a data sequence of observations which are ordered in time, which exists in various fields extensively, such as industry, economy, finance, science observing and social science, etc. How to manage and use these time series data efficiently is an interesting problem. Classical time series analysis always proposes a hypothesis first, and then it proves its validity, which is not suitable for discovery task. Time series data mining can extract hidden and potentially useful knowledge from large amounts of data which maybe omitted by users.The thesis addresses the research on the similarity-based time series data mining, which covers the representation method, similarity measure, similarity searching and index structure of the time series and the prototype system of the time series data mining. The main works and contributions of this thesis include:1. The representation method of Segmented Extreme Value Extraction is proposed. Different to the traditional representations, it can depict the whole trend and local features of a time series correctly at the same time. It is designed by Piecewise Linear Representation and Landmark Model. And the related experiments have proved its correctness and high efficiency.2. A new method of similarity measure based on Segmented Extreme Value Dynamic Time Warping (SEDTW) distance is proposed. SEDTW distance is an effective method of the time series by scaling and warping along the time-axis. It divides time series into several segments and extracts the extreme values in each segment, and then measuring the new extreme value series on the dynamic time warping (DTW) distance. Compared with the classical DTW distance, this new method is much faster in speed and is almost no degrade in accuracy. This conclusion can also be proved by the experiments in this thesis.3. The similarity searching based on DTW distance in the time series database is studied. The thesis firstly uses R*-tree as the index structure of the time series database in order to improve the searching efficiency. Then it searches the similar series along the R*-tree index structure by the DTW distance. Both the whole matching and subsequence matching algorithms have been implemented in this thesis.4. An integrated framework of the time series data mining prototype system is proposed. This framework is composed by functional units and flexible interfaces. It can support various synthesis-advanced services, such as mining patterns and similarity searching.
Keywords/Search Tags:time series, time series data mining, similarity measure, DTW distance, SEDTW distance, similarity searching
PDF Full Text Request
Related items