Font Size: a A A

Research On Feature Representation And Similarity Measurement Method Of Time Series

Posted on:2020-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y CaoFull Text:PDF
GTID:2370330578963931Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The collection of the given attribute of an objective object over time constitutes the time series data.As an important data type,it's widely used in almost all aspects of social life,such as the domestic and foreign GDP,weather forecast,final transaction price of stock futures,and index size of other data types during a certain time interval.The studies of time series help to mine the time-related valuable information in the data,and thereby be favorable to knowledge extraction.Due to the endlessness of time in the real world,the database size of time series type can be very large and can reach the order of magnitude of TB units.The high-dimensional,complex,dynamic and high-noise characteristics of the time series itself doomed that direct analysis of the original sequence usually led to low efficiency,inaccurate mining results and reduced reliability of research conclusions.Therefore,how to effectively preprocess time series has become one of the most challenging research topics.Preprocessing of time series can be divided into two aspects: pattern representation and similarity measurement.The pattern representation of time series can extract the key features of time series,reduce the dimension of feature space,and reserve the form of the original sequence,paving the way for the next step of research.Due to the high compression ratio and simple form of the pattern representation,many scholars have participated in this research.The traditional pattern representation algorithms for time series data usually ignore time characteristics of time series,resulting that fragmented results are not accurate enough.Addressing this problem,A time series modeling representation algorithm based on hyperbolic tangent constraint is proposed in this study.This algorithm introduces the hyperbolic tangent function on the basis of the piecewise aggregation approximation and proposes the concept of the motion enhancement factor.Considering the differences between the amounts of information in each subsequence,it completes the final time series segmentations by extracting the facilities the models of segmented time series fit the original time series better and completes the final representation of the segmented time series.Similarity measure,just as its name implies,is to compare the similarities between sequences and find a sequence in a specified database that is similar to the given sequence under a certain definition.It is an important and basic processing task of patterns recognition in the field of data mining for time series data,which is critical to the detection and prediction of the abnormal time series.The performance of the traditional dynamic time warping algorithm is susceptible to outliers and local noise points,resulting in low accuracy of the operation results and inaccurate results for complex time series data.Addressing this problem,this paper proposes a novel similarity measure based on morphological distance and adaptive weight.The proposed algorithm first uses the trend filter to compress the original sequences to be compared.Secondly,the algorithm introduces morphological distance to calculate the distance matrix of two time series.Finally,the algorithm uses the adaptive weighted distance function to extract the differences of information in each sub-sequence and uses the concept of dynamic time warping to complete the final time series similarity measure.Finally,a series of experiments were carried out on a huge number of public data sets,and these experiments showed that:(1)The time series modeling representation algorithm based on hyperbolic tangent constraint has a small fitting error.Segmenting time series data using this algorithm can complete the macro similarity search better under the condition that time series can grow dynamically.In the algorithm,both of the universality and accuracy are improved.(2)The similarity measure algorithm based on morphological distance and adaptive weight are more robust and can make better use of the morphological features of the sequence to complete the macro similarity measure.Meanwhile,this algorithm is more efficient,stable,and accurate when dealing with complex data.
Keywords/Search Tags:time series, piecewise linear representation, hyperbolic tangent function, similarity measure, morphological distance
PDF Full Text Request
Related items