Font Size: a A A

Research About Similarity Search On Time Series Stream Data

Posted on:2020-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y M DingFull Text:PDF
GTID:2428330572484273Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Similarity-based time series retrieval has been a subj ect of long-term study.It is one of the basic problems and one of the core issues in time series data mining.It is widely used in financial data analysis,meteorological data prediction,multimedia data retrieval,medical data anomaly detection and other fields.In this paper,we address the problem of searching time series similar to the query sequence over high-speed stream time series data.The similarity search problem of time series stream data can be roughly divided into two phases:time series representation and similarity search.In the time series representation phase,we perform dimensionality reduction on the original time series to reduce the time and space cost of subsequent search work,and to filter noise interference,improve search efficiency and accuracy.The similarity search phase refers to the use of multiple efficient search techniques based on representations,combined with similarity calculations to find similar result sets.Based on the analysis of the latest research on time series data mining at home and abroad,this paper studies the segmental linear representation of time series and the key techniques of high-efficiency similarity search from the time series similarity search.To summarize,the main contributions of this paper are listed as follows:1.In this paper,we analyzed two representations piecewise linear approximation and piecewise accumulation approximation.We combined the segmentation part of two representative representations multi-resolution important point search representation(MIP)method and the piecewise accumulation approximate representation(PAA)method,and propose an important point-based average segmentation algorithm to divide the time series into segments.Compared with the MIP method,the algorithm has low time complexity and high computational efficiency.Compared with the PAA method,it can retain important feature points in the time series and prepare for subsequent similarity search work.2.Based on the segmentation of time series,this paper proposes a multi-resolution accurate search strategy.We introduced a novel multi-resolution representation for stream time series data,which can be stored with the structure of the binary search tree.In the process of multi-resolution representation,from low to high,the lower bound distance is calculated incrementally,and the search sequence is filtered step by step.The experimental results show that for most of the sequences to be searched,filtering can be completed at a lower resolution,thereby saving computational cost and improving search efficiency.3.Based on the segmentation of time series,this paper proposes an efficient approximate search strategy.The strategy performs an approximate similar search on the time series.The calculation method of the approximate distance preserves the slope characteristics of the segmentation sequence,which improves the filtering efficiency,and the experimental results show that the search results are more accurate.However,this method does not satisfy the following definitions,so there is no guarantee that the results will not be reported.Compared with the general strategy of traditional time series similarity search to build index,the two search strategies proposed in this paper focus on incremental calculations to adapt to the characteristics of stream data change and update,without dealing with the dimensional disaster of indexing.And the huge cost of updating the index.Experiments show that under the two search strategies,the similarity search on the time series stream dataset is more accurate than the existing strategy,and the search efficiency is improved.
Keywords/Search Tags:time series, stream data, similarity search, multi resolution
PDF Full Text Request
Related items