Font Size: a A A

Research And Implementation Of Time Series Similarity Connection Algorithm

Posted on:2020-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:L L ChenFull Text:PDF
GTID:2430330626463981Subject:Software engineering
Abstract/Summary:PDF Full Text Request
A time series is a sequence of data items or observations are ordered in time.It has the characteristics of large data size,high dimension and real-time data updating.Time series are widely existed in many fields in real word.The hidden information in time series can provide more important information for the development of things.With the emergence of time series in many fields,the analysis of time series become more and more popular.Time series similarity join is a primitive operation that retrieves all pairs of similar or correlated subsequences from two time series.At present,there are there main problems facing the time series similarity join.The first is to pre-process the original time series;the second is how to select the similarity measure function;the last is how to analyze the large-scale time series efficiently and accurately.In this paper,we focus on the similarity join of time series in these three aspects,and implements an efficient algorithm on distributed platforms.Firstly,for the preprocessing operation of time series,we select the ZNormalization standardization to preprocessing the original data in this paper.Secondly,the selection of the similarity measure function,the Pearson correlation coefficient is a commonly used similarity measure for time series data mining due to its multiple beneficial mathematical properties,such as it is invariant to scale and offset.Using Pearson correlation coefficient to join two time series on subsequences can provide important information compared to other similar metrics.In the era of big data,time series analysis techniques require high execution efficiency and scalability to cope with large-scale time series data.In order to improve the efficiency of time series analysis,the research of time series similar join is also continuously optimized and updated.With the widespread use of distributed and parallel platforms,its efficient computing capabilities provide significant benefits for the efficiency of large-scale time series processing and analysis.Based on the above problems,in this paper,we propose a parallel time series join algorithm,and implemented this algorithm in different parallel platforms.The main contributions are as follows.1? In order to improve the computational efficiency of Pearson correlation coefficient between two time series,a parallel FFT algorithm is proposed.2? The time series join algorithm is implemented on the Spark platform.In order to adopt to the properties of distributed platform,this paper proposed a time series segmentation approach.And to reduce the scanning time of dot product matrix,we proposed a matrix partition method.In experimental part,we performed extensive experiments on the real-different datasets and analyzed the experiment results.The experimental results demonstrate the efficiency and scalable across different datasets.3? The time series join algorithm is extended on the Map Reduce platform and verified on different datasets.The results verify the efficiency and performance of our algorithm on Map Reduce platform.4? In this paper,the join algorithm is applied to the motif discovery of time series,and the partial motifs are given to demonstrate the effectiveness of the proposed algorithm in motif discovery.
Keywords/Search Tags:Time Series, Similarity Join, Pearson correlation coefficient, Parallel, Data Segmentation
PDF Full Text Request
Related items