Font Size: a A A

Research On Time Series Classification

Posted on:2017-05-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:J D YuanFull Text:PDF
GTID:1220330491451515Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Time series data exists widely in almost every field of our daily life. It is real-valued sequence data, which is also high dimensional, large volumed, and updated continually. The main difference between time series classification problem with traditional classification issue is that, the former one’s variables are ordered in timestamp, while for the latter one, the order of each variable is unimportant, and the correlation between each other is independent of their relative positions. Therefore, time series classification problem has become one of the greatest challenges of data mining.There are three major challenges for time series classification. First, for traditional classifiers, input data is considered as a feature vector, while there are no explicit features in time series data. Second, although feature selection methods could be applied on time series, it is time consuming since the high dimensionality of time series. Third, besides accurate classifiers, we may also want to build an interpretable classifier. But it is difficult for time series since there are no explicit features. In order to solve these three problems, this dissertation mainly focuses on building interpretable classifier for time series, the main contributions are as follows.(1) A logical shapelets based transformation method is studied. Time series shapelets is considered as the most discriminative subsequence of time series. First, the process of discovery shapelet is time consuming, even though shapelets are computed offline. This problem is addressed by using an intelligent caching based and reusable skill, which reduces the time complexity of finding shapelets by an order of magnitude. Second, in order to improve the interpretability of shapelet transformation, a novel transformation that is based on conjunctive or disjunctive of shapelets is proposed. This method tranformes original time series data into traditional data, which could be treat by classical classifiers. Experimental results have shown the efficiency of logical shapelets transformation on classic benchmark datasets used for these problems, which can improve classification accuracy, whilst retaining their interpretability.(2) A simple but effective shapelet pruning and coverage method is proposed. First, previous algorithms often inevitably result in similar shapelets among the selected shapelets. This work addresses this problem by introducing an efficient and effective shapelet pruning technique to filter similar shapelets and decrease the number of candidate shapelets at the same time. Second, on this basis, a novel shapelet coverage method is proposed for selecting the number of shapelets for a given dataset, which ensures the coverage of original dataset. Experiments on the classic benchmark datasets for time series classification, comparing with distance metrics based 1-NN and other shapelets based methods, demonstrate that the proposed transformation is interpretable and improve classification accuracy as well.(3) To the best of our knowledge, we proposed the first work that discovery association rules on time series datasets. The interpretability of SAX-based associative classifier is represented and experimental results show that classifiers built this way are competitive. First, a SAX (Symbolic Aggregate approXimation) representation that discretizes original time series into symbolic string is adopted since traditional association rules can only handle transcation dataset. Second, a modified eager CBA (Classification Based on Associations) algorithm is proposed to discover Class Sequential Rules and make the final prediction firstly; On this basis, a lazy associative classification is proposed, which is in contrast to the eager one that generates excessive number of rules, but still unable to cover some test data with the discovered rules. In addition, four different methods that select the mined rules are also proposed for carrying out associative classification.(4) An interpretable DTW (Dynamic Time Warping) based robout k-NN (k Nearest Neighbours) classifier is studied.k-NN is considered as the bench mark classifier for time series classification, but it is not interpretable. For that, a novel and effective time series weighting model is proposed to provide corresponding weight for each time series alignment firstly; Then, a weighted DTW dissimilarity measure, based on two different function, is proposed to discover discriminative subsequence of time series; Compared with other dissimilarity measure based k-NN classifiers, our method shows the ability of interpretable; Last but not least, we propose an extension of our weighting model to multivariate time series classification, discussing its special case, weighted Euclidean distance at the same time; We also evaluate the proposed method on sets of univariate/multivariate time series, demonstrating the utility of our discriminative local weighting model.In conclusion, the achievements of this dissertation have demonstrated the process of building time sereis classifiers and classifying instances on several facets. Experimental results showed their efficiency of discovery time series shapelets or discriminative subsequences, improving the interpretability at the same time. Moreover, this dissertation has laid a sound foundation for real applications.
Keywords/Search Tags:Time series, Time series classification, Shapelets transformation, Associative classification, Interpretable, κ-nearest neighbours
PDF Full Text Request
Related items