Finding subsequences with similar trends from the sequence data set is a key technology in sequence data mining.The technology has important applications in several fields such as finance,healthcare,meteorology and network security.Subsequence query generally uses Dynamic Time Warping(DTW)as the similarity measure algorithm.However this algorithm has high time complexity,so it is difficult to implement online query when querying long subsequences.The time series representation method can effectively reduce the time overhead of the query by reducing the dimension of the sequence.Therefore,this paper uses a combination of time series representation and similarity measure algorithm to solve the problem of fast query of similar subsequences in time series data.The specific research contents are as follows:(1)An algorithm MONEX(Modify ONline EXploration of time series)for fast querying of long subsequences is proposed.First,all subsequences of a certain length in the data set are grouped,and the representative subsequences are marked.Secondly,during the query process,the query sequence is divided into short sequences of a specified length and determined by the DTW algorithm.Subsequence candidate sets similar to these short sequences.Finally,sequence splicing is performed on the candidate sets to obtain a sequence of query results.A large number of experiments on real data sets show that the proposed MONEX algorithm is nearly 10 times more efficient than the most advanced algorithm.(2)The grouping process of subsequences(the time series representation process)uses the Euclidean Distance(ED)to measure the similarity between the subsequences,then group them according to the similarity results.This paper proves a solusion that is a triange inequality between the ED and the DTW.Therefore,using the DTW algorithm after sequence representation can ensure the accuracy of the query.(3)In order to meet the query requirements under different similarities,the rules for dividing the query level are proposed.A large number of experiments shows that when the similarity threshold is equal to 0.2 and the sequence segmentation length is 20,the algorithm can be executed efficiently with high accuracy. |