Font Size: a A A

Research On Time Series Data Mining Based On Trend And Feature Subsequences

Posted on:2020-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ZhuFull Text:PDF
GTID:2370330578463925Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays,human activities generate data in all kinds of industries while the society enters into the era of big data.As time goes by,the amount of data has grown rapidly with the innovation and development of technology.These data are widely used in business,meteorology,agriculture,biological sciences,and ecology.For example,in the commercial market,people observe weekly profit,monthly price index,and annual volume of sales.In the aspect of meteorology,people use it to observe daily high and low temperatures,annual precipitation as well as drought index,and the wind speed per hour.As we all know,these data contain a lot of useful information.Therefore,it will make a great contribution to human society since we can mine the data out completely and meticulously.However,due to its characteristic of wide range and large amount,it leads that the original sequence has the characteristics of high data dimension,many interference factors and dynamic changes in real-time update.Since all of these characteristics,it is very difficult to explore the knowledge from the original sequence,thus it leads to even low-accuracy.In order to solve the problem of data mining caused by these characteristics,the original data is subjected to an effective pre-processing step.There are two key parts in the time series preprocessing step that are the piecewise linear representation of the time series and the work for similarity of the time series.Therefore,in this study,we will conduct a corresponding study on these two aspects,in order to achieve better results in data pre-processing.The first two chapters of this paper summarize the research background,significance and research status of this paper.The main research innovations and corresponding work of this paper are mainly in the third,fourth and fifth chapters,which can be summarized as follows:(1)Point at the characteristics of huge amount of data,high dimension and high complexity of current data,and the shortcoming of the existing methods that have not high compression rate by segments.In the third chapter of this paper,the geometry of upward and downward trend for the time series have been studied and we proposed the concept of upper and lower filter points and upper and lower filter lines.Since the geometric characteristics of the trend are well analyzed by upper and lower filter lines and points,a global trend judgment method is proposed in this study.In addition,a trend-based piecewise linear representation method based on trend is further proposed.It is an easy to reach by program using this method since that the complexity of it is O(9)).The experimental results show that the algorithm has fewer segments and the degree of approximation is also pretty good compared with other similar algorithms.(2)For the disadvantage of high dependence on sequence value of traditional European distance queue.In the fourth chapter of this paper,several aspects are taken into account which contains of several points,angles,three elements,point value gaps,sequence trend changes and morphological differences in geometry.The proposed algorithm integrates three geometric features of point value gap,sequence trend change and morphological difference,and it also constructs a triangle shape distance,which is used as a threshold to analyze the micro-view distance and macro-trend distance of two sequences,thus measuring the similarity of two sequences.Based on the adaptive triangle distance algorithm,we named it the ATD algorithm.The experimental results of ATD algorithm show that the ATD algorithm greatly improves the measurement efficiency of time series pairs.At the same time,the metric accuracy of the time series pair of the method is much higher than the traditional metric algorithm.(3)Aiming at the problem that the traditional similarity measurement algorithm can't deal with the outstanding problem dynamically.In the fifth chapter of this paper,we proposed an similarity measurement algorithm of time series based on adaptive feature subsequence.The method introduces a scale space theory of signal.The aim of this algorithm is to select a plurality of subsequences from a random position and a length,which are used to divide into shorter intervals for capturing information of time series data in order to capture information of time series data.For each interval,features such as the trend of the fitted regression line,average difference of values and variance are extracted to provide the morphology and distribution of the sequence.The features computed from these subsequences generate a new data set.Then,the features of each subsequence provide an instance.In adition,all instances of each time series forms a packet.What's more,these also define a class label for each instance to measure the expansion of attributes and original sequences at different locations.This provides a feature-based approach that is different from DTW but can handle warpage as well.We call this the subsequence sequence similarity measurement algorithm based on adaptive feature subsequence,referred to as the AFS algorithm.Compared with the mainstream algorithm and the ATD algorithm,when the two time series lengths that need to be measured are different or the time spans are different,the AFS algorithm can dynamically complete the similarity measurement work of time series without manually performing the sequence dimension pairing,so the time and space complexity is greatly reduced.
Keywords/Search Tags:time series, piecewise linear representation, trend, similarity measure, subsequence
PDF Full Text Request
Related items