Font Size: a A A

Time Series Classification Methods Based On Shapelet

Posted on:2018-03-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:C JiFull Text:PDF
GTID:1310330542952128Subject:Software engineering
Abstract/Summary:PDF Full Text Request
A time series represents a collection of values obtained from sequential measurements over time.Time series is always considered as a whole instead of individual numerical field.Furthermore,the high dimensionality,high feature correlation,fast refresh rate and typically high levels of noise found in time series provide an interesting research problem.Time series data analysis is becoming more and more important in the fields of machine learning,data mining and data warehouse.Time seris data analysis is one of the 10 challenging problems in data mining research.In time series classification,an unlabeled time series is assigned to one of two or more predefined classes.Time series classification can be regarded as a guided learning.Time series classification is a hot issue in time series data research,and it is also a widely used problem.In manufacturing field,time series classification has a wide range of application scenarios.In manufacturing field,the classification result of time series data can be used to system anomaly detection,system intrusion detection or process controling.With the help of time series data classification,we can promote the development of intelligent manufacturing.Currently,time series data classification,especially in the manufacturing field,is faced with the following problems and challenges:(1)Multi-source data and heterogeneous data:there are many different types of devices in the enterprises.The devices generate heterogeneous data.How to ingest the data in a unified form will be a.big challenge.(2)Big data:in the field of manufacturing,there are a large number of devices on the production line,which produce observational data at higher frequencies.This inevitably leads to large-scale data generation.How to store and analyze data in a big data environment has become another challenge.(3)Fast time series data classification method:in the manufacturing field,the high frequency of device data acquisition leads to a high time series dimension.And then,the high dimension will lend to long training time to found the classifiers.How to quickly build a classifier for high-dimensional time series data becomes a great challenge to time seris classification in the manufacturing field.(4)Interpretable time series data classification method:time series does not have a direct feature.Even through the complex feature selection technology,the potential feature of the dimension is still very high,the characteristics of time series is difficult to capture the nature.In the manufacturing field,in order to guide the production and management decisions through the time series classification results,it is necessary to give intuitive interpretable classification results.In this paper,we focus on the problem of time series classification in intelligent manufacturing field.In this paper,we focus on the problems and challenges in the process of time series data classification in intelligent manufacturing field.A series of studies have been carried out in three directions from the multi-source heterogeneous device data ingestion model,the time series representation method and the fast and interpretable time series data classification algorithm.The main work and contributions of this paper include:(1)Despite having played a significant role in the Industry 4.0 era,the Internet of Things is currently faced with the challenge of how to ingest large-scale heterogeneous and multi-type device data..In response to this problem we present a heterogeneous device data ingestion model for an industrial big data platform.The model includes device templates and four strategies for data synchronization,data slicing,data splitting and data indexing,respectively.We can ingest device data from multiple sources with this heterogeneous device data ingestion model,which has been verified on our industrial big data platform.Based on this model,we ingest various forms of device data from several data sources in multple enterprises and save the multi-source heterogeneous device data in a time-series format to the industrial data platform we have built.(2)In the case of time series characterized by a large data amount,high dimensionality and rapid renewal,data mining and presentation of the original time series data will be difficult.This paper incorporates important data points(IDPs)into a piecewise linear representation(PLR)of time series data,called PLR-IDP.The method finds important data points according to the weights of segments that are a measure of the segment fitting errors and maximum point fitting errors,and then represents time series approximately by lines that comprise the important data points.Results from theoretical analysis and experiments show that PLR-IDP reduces the dimensionality while maintaining the main characteristics with small fitting errors of segments and single points.This method can meet the needs of different users by constructing a binary tree.The response to user requests is fast,so this methos is suitable for large-scale data in intelligent nanufacturing field.(3)A shapelet is one fragment of a time series that can represent class characteristics of the time series.A classifier based on shapelets is interpretable,more accurate,and faster.However,the time it takes to find shapelets is enormous.This article will propose a fast shapelet(FS)discovery algorithm based on important data points(IDPs).First,the algorithm will identify IDPs.Next,the subsequence containing one or more IDPs will be selected as a candidate shapelet.Finally,the best shapelets will be selected.Results will show that the proposed algorithm reduces the shapelet discovery time while maintaining the same level of classification accuracy rates.(4)Aiming at the classifiers are not interpretable and the classifiers need long time to build in the industrial field,a fast shapelet selection algorithm(FSS)is proposed to accelerate the shapelet transform classification process.Shapelets are discriminative subsequences which have the property that the minimium distance between a shapelet and the time series is a good predictor for time series.In our algorithm,we first sample some time series from training data set with the help of a subclass splitting method.Then FSS identifies the important data points(IDPs)for sampled time series and selects the subsequences between two nonadjacent IDPs as shapelet candidates.Through these two steps,the number of shapelet candidates is greatly reduced,which leads to an obvious reduction in time complexity,linlike other methods which accelerate shapelet selection process at the expense of reducing accuracy,the experimental results demonstrate that FSS is thousands of times faster than ST with no accuracy reduced.Our results also demonstrate that our methods is the fastest method among the shapelet-based methods which have the leading level of accuracy.What's more,FSS can be adopt to some ensemble methods to speed up them with no accuracy reduced.(5)Based on FSS,we construct an online time series classification system.The system can realize real-time dynamic classification of time series in intelligent manufacturing field.
Keywords/Search Tags:intelligent manufacturing, time series data, classification, data ingestion, data representation, shapelet
PDF Full Text Request
Related items