Font Size: a A A

Research On The Time Series Classification Based On Diversified Top-k Shapelets

Posted on:2018-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q F SunFull Text:PDF
GTID:2348330539475495Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Time series refers to the sequence of a certain statistical index of some phenomenon at different times.Since the interior of a real system or phenomenon is often affected by a variety of factors,the resulting time series have many complex manifestations: high dimensionality,complex structure,high noise,and similarity deformation.The traditional time series analysis method uses the statistical method to model the time series,but its complex characteristics make it difficult to meet the requirements of the actual system.Therefore,the time series research method based on data mining came into being,and has become an active research field.Time series classification is a kind of important research contents in the field of time series data mining,and its task is to allocate a class label for a given time series data by constructing a classifier.As a classification method for local morphological features,shapelets can distinguish small differences between subsequences,so as to obtain good classification effect,which can be applied in many fields such as medical diagnosis,posture recognition and so on.But,there are still problems to be solved urgently.To solve these problems,the main contents of this paper are as follows:(1)A new time series classification method based on diversified top-k shapelets series is proposed.Focused on the issue that time series classification method based on shapelets can not effectively remove the redundant shapelets,in this paper,we introduced the top-k query method in the field of information retrieval,and proposed a concept of diversified top-k shapelets and corresponding diversified top-k shapelets graph which are used to process candidate shapelets and select the most discriminative shapelets but not similar to each other to improve the shapelets feature selection.And the efficiency of the classification method based on shapelets is improved by using SAX to reduce the dimension of the original time series data sets.The experimental results show that this method not only has higher accuracy than the traditional classification method,but also improves the classification accuracy by 48.43% and 32.61% compared with the method using the clustering method(ClusterShapelet)and shapelets coverage(ShapeletSelection).At the same time,the efficiency of all 15 datasets is upgraded,at least accelerated by 1.09 times,up to 287.8 times.(2)Aiming at the problem that the existing shapelets classification method can not solve the problem of imbalanced time series classification,a time series classification method(DivIMShapelet+SMOTE)based on diversified top-k shapelets is proposed for the imbalanced dataset.The imbalanced data classification evaluation index AUC instead of the traditional information entropy is used measure the shapelets,and the training set is transformed by using diversified top-k shapelets technique.Finally,the SMOTE method is used to oversample the transformed training set.This method uses the AUC value to be insensitive to imbalanced data,making the shapelets feature more accurately assess the accuracy of the classification and not only effectively extracts the time series feature,but also performs the balance processing of the dataset on the basis of the feature.The results show that the accuracy of DivIMShapelet+SMOTE is 38.8% and 10.2% higher than that of DivTopKShapelet and INOS+SVM.The AUC value is increased by 0.37 and 0.08,and the F-measure is increased by 0.35 and 0.15,and this method can effectively classify imbalanced time series data.
Keywords/Search Tags:time series classification, shapelets, diversified top-k, imbalanced data classification
PDF Full Text Request
Related items