Font Size: a A A

Research On Key Issues In Time Series Data Mining

Posted on:2015-03-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:X X HeFull Text:PDF
GTID:1260330428984387Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Time series, as an objective recording in each moment for the observation system or phenomena, exists in various fields. It can distinguish and reconstruct the dynamic behavior of system through analyzing the corresponding time series. The old approach using analytic function to build global model for time series is based on statistics, but this method requires that time series must meet some assumption. Meanwhile, time series collected from real life often has complex structures and large amount of noise, so the required assumption is difficult to achieve, and the global model is not easy to be captured by the analytical function. Hence, there is a strong need for a method taking full advantage of time series to find more latent regular and knowledge.Promoting by this demand, time series data mining arises at the historic moment. As a young and active research area, data mining offers a fresh perspective for time series research. Unlike other data type in data mining, time series is very complicated for having the following characteristics:the high dimensionality, large amount, noise, scaling or shift in amplitude, scaling in time axis, and linear drift. These inherent characteristics make time series data mining challenging. Although researchers at home and abroad made many discoveries in time series data mining, there are still several critical problems unsolved.In this dissertation, we carried out the related researches on the unsolved critical problems in time series data mining, including time series approximate representation, similarity measure methods, and time series clustering. Our main contributions are follows:First, in view of the facts that most current time approximate representation methods need fine-tune parameters and always purse dimension reduction at the expense of the basic information, we propose a non-parametric symbolic approximate representation model-NSAR. Unlike traditional approximate representation, NSAR applies symbolization and encoding method instead of losing the basic information to reduce the dimensionality. Meanwhile, it employs multi-scale wavelet approximate coefficients and key points to retain the basic information as much as possible. To settle the problem of tuning parameters carefully, NSAR is designed to be non-parametric, which is assured from three aspects:first, in multi-scale DWT, the decomposition level Iog2n depends on the length of time series; second, the key points extraction is implemented in wavelet approximate coefficients, it not needs to set threshold to eliminate noise; third, symbolization is executed in key points sequence, and it only needs two symbols to represent upward and downward. The experimental results show that NSAR can greatly reduce the dimensionality, retain more basic information of original time series, and have no parameters.Second, in view of the fact that current similarity measure methods can not bear multiple distortions, in this dissertation, we propose a new similarity measure based shape information-SIMshape for invariant with multiple distortions. Unlike current similarity measure methods, SIMshape focuses on the comparison of the basic information in judging similarity, and reduces the impact of the detailed information. This is because the distortion in similar sequence merely alters the local detailed information, and the basic shape information is unaffected. The computation of SIMshape is based on multi-scale shape information and the scale weight function. The goal of introducing the scale weight function is to further reduce the disturbance of similarity distortion by assigning larger weight to the coarse level and smaller weight to the fine level. The experimental results show that SIMshape has greater robustness for enduring greater degree of the distortion and more kinds of distortions.Third, in view of the facts that the effect of most current time series clustering is not satisfactory and the process is lack of spontaneity, in this dissertation, we propose a new clustering based on global characteristic and nuclear field. To avoid too much human intervention in the process, the proposed dynamic clustering uses nuclear field to illuminate the interaction between objects in the data space. Therefore, it can find the natural hierarchy of clusters via the movement of data objects under the mutual nuclear force, and this process does not need human interference. Meanwhile, to tackle the problem of producing low quality clustering result, the proposed clustering chooses the optimal global features instead of original time series as the input of clustering. As the optimal features makes time series from different clusters have different areas in the data distribution, the subsequent clustering algorithm can find the true clusters easier. The experimental results show that the proposed clustering method can discover the true cluster spontaneously, tolerate unequal length of time series, and be invariant with the disturbance of noise or missing data.In this dissertation, we carry out our works from three aspects:Proposing a non-parametric symbolic approximate representation model to reduce the loss of the basic information and implement the non-parametric design; Designing new similarity measure based shape information to be invariant with multiple distortions; Establishing a new time series clustering method based on global characteristic and nuclear field to find the true cluster spontaneously. Experimental results show that our proposed methods can have better performance in time series best matching, effectively reduce the error for time series one nearest neighbor classification, and find the true cluster spontaneously.
Keywords/Search Tags:time series data mining, approximate representation, similarity measure, one nearest neighbor classification, clustering
PDF Full Text Request
Related items