Font Size: a A A

Research On Feature Representation And Similarity Measure Methods In Time Series Data Mining

Posted on:2013-11-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:H L LiFull Text:PDF
GTID:1228330395999261Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of social economy and information technology, data volume of time series increases faster and faster. Accordingly, the potential and valuable information and knowledge in time series databases are discovered by data mining techniques, which attracts much more attentions recently, and the research achievements have been in this field successfully applied to various fields, including economy, finance, electronic information, medicine, education, industrial and engineering. However, feature representation and similarity measure for time series are both of the most basic and key work in the tasks of time series data mining. Their qualities often impact the results of time series data mining. Time series increases over time, and its properties involving high dimensionality, dynamic and uncertainty block the application of traditional data mining techniques. Feature representation aims to use a few features to approximately represent time series, which can reduce dimenension so as to improve the efficiency of data mining tasks. Meanwhile, similarity measure is a method of measuring the differentiation between time series, which is often combined with feature representation. The results of similarity measurement between time series can be applied to the tasks of time series data mining including classification, clustering, similarity detection and abnormal patterns discovery. This dissertation respectively regards time series with equal and unequal length as the objects of the study and discusses different methods used to achieve the feature representation and similarity measure for the two kinds of data, which makes the methods be used perfectly and effectively in time series data mining, and potentially valuable information and knowledge can be obtained. The main research work are as follows:First, in terms of the global feature of time series with equal length, similarity measure methods based on feature representation of orthogonal polynomial regression coefficients are proposed. Through analyzing the influence on fitting effect of the highest degree of polynomial to the global shape of time series and choosing a group of suitable feature coefficients to reflect the main shape of time series, we propose a better similarity measure method to compute the distance between the feature sequences. Moreover, the lower bound of distance function is also proven in theory. All of these improve the performance of the method which can be used to time series similarity search.Second, for the problems about the feature representation based on piecewise aggregate approximation in time series with equal length, multidimensional features are used to represent time series and the related similarity measure methods with lower bound are formed. Through analyzing the traditional piecewise aggregate approximation and similarity measure based on lower bound, we use the features with different dimensions to represent piecewise sequences and propose two approximation methods which are respectively based on two dimension statistical characteristic and two dimension shape characteristic. They improve the efficiency of traditional piecewise aggregate approximation methods used in time series data mining. Meanwhile, the representation for piecewise sequences with two dimension extends to another with high dimension, which improves the performance of distance function used in feature sequences with high data compression.Third, piecewise representation based on cloud model theory is proposed for equal-length time series and similarity measure methods with a good performance are given. Cloud model is used to reflect the uncertainty of the distribution of piecewise sequences, and the similarity measure methods used to cloud model are given to measure the similarity of cloud models, which can achieve the similarity measure between cloud feature sequences. Although the similarity measure methods based on cloud model do not satisfy lower bound, they consider the volatility and the uncertainty of time series from the local and global perspective. Meanwhile, they have better qualities of similarity measure and efficiently improve the performance of the related algorithms in time series data mining.Fourth, for the problem about high time cost of the traditional dynamic time warping method used to measure time series with unequal length, two improved warping measure methods are proposed. First of all, under the trade-offs between computation speed and measure accuracy, a method of adaptive and fast piecewise linear approximation is proposed to represent time series. At the same time, a similarity measure based on derivative dynamic time warping is used to fast and effectively compute distance between unequal-length time series. They compose the new algorithm to represent time series and measure the similarity. In the second place, to address the problem of high computation of dynamic time warping method, the strategies of reducing search scope to find the optimal warping path and in advance stopping the finding process are used, which improves the computation efficiency of the traditional dynamic time warping method when used in time series similarity search.Fifth, research on the application of feature representation and similarity measure methods in engine data mining. According to the property of engine performance parameters, a group of new feature representation and similarity measure are used to achieve data mining in the performance parameters of engine, which effectively complete the feature recognition and fault detect in the time series of engine performance parameters. Meanwhile, it gives the knowledge discovery a new perspective in the design process of engine, which provides a reference for running safety of engine.The above research achievements are experimentally tested and verified that they are valid to represent different kinds of time series data and measure their similarity. Moreover, their performance to improve the related algorithms of time series data mining is compared. All of these further refine the studies of feature representation and similarity measure in time series data mining.
Keywords/Search Tags:Time Series, Data Mining, Feature Representation, Similarity Measure, Distance Measure
PDF Full Text Request
Related items