Font Size: a A A

Research On Feature Representation And Classification Methods In Time Series Data Mining

Posted on:2019-07-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y P HuFull Text:PDF
GTID:1368330572954322Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Time series as a time-related high-dimensional data,widely existed in the extensive application areas of the real world.Time series not only has the tra-ditional big data characteristics of "large volume,high-dimensional"but also been continuously generated in a high speed and potentially forever,which could be recognized as streaming times series.Due to the large amount,high dimen-sional and continuous characteristics of streaming time series,several related data mining researches such as time series query,similarity measure,pattern recogni-tion,classification,clustering and so on are incapable to do in-depth researches as they used to do in some static and small datasets.Therefore,how to discover potential knowledge from streaming time series has become a hot issue in current data mining research.According to the relevant research results,time series data mining has become one of the top 10 most challenging problems in data mining in this century and has received extensive attention from researchers at home and abroad.In this dissertation,we carry out a series of studies on two key issues in time series data mining research:time series feature representation and time series classification.The main research contents and innovations of this thesis are listed as follows.Firstly,we propose an online representation method based on turning points for streaming time series,called OPLR-TP for short.OPLR-TP could produce piecewise linear representation for streaming time series in an "online" manner.OPLR-TP could perform the corresponding representation from two different criteria:maximum error for point and maximum error for segment error.At the same time,the turning points and optimal merging strategy are also used to ensure the relative operational efficiency of OPLR-TP.We have carried out a large number of comparative experiments between OPLR-TP and the state-of-the-art online representation methods for streaming time series.The experimental results show that OPLR-TP not only has higher representation accuracy,but also has higher operating efficiency than the baseline methods.In addition,OPLR-TP is less affected by parameter changing and has good robustness.Secondly,we propose a multi-resolution hybrid representation method based on adaptive representation index for streaming time series,called MHR-ARI for short.MHR-ARI is a general method that can produce muti-resolution seg-mentation representation and multi-resolution symbol representation for stream-ing time series.With the help of adaptive representation index,called ARI for short,a series of corresponding multi-resolution representation results based on piecewise linear representation(PLR),piecewise aggregate approximation(PAA)and symbolic aggregation approximation(SAX),could be generated efficiently ac-cording to different representation requirements.We conducted a large number of comparative experiments between MHR-ARI and the state-of-the-art multi-resolution representation method.The experimental results show that MHR-ARI not only has a higher efficiency on ARI construction and PLR,but also provide multi-resolution representation results based on PAA and SAX at the same time.Thirdly,we propose an efficient Shapelets selection method,called ESS for short,to improve the Shapelets selection efficiency of the original Shapelet-based time series classification method:Shapelet Transformation,called ST for sim-plicity.The primal shapelets selection strategy in ST will be replaced by ESS to form a novel ensemble TSC method named ST-ESS.ESS could improve the Shapelets selection efficiency by selecting representative time series and refining shapelet candidates to further improve the overall efficiency of ST-ESS.The Ex-tensive empirical results on a large number of benchmark time series datasets demonstrate that ESS has a higher selection efficiency than other current state-of-the-art shapelet selection strategies.Moreover,the efficiency of ST-ESS can be improved by three orders of magnitude compared to the original ST method,while ensuring the corresponding classification accuracy is in the same level with that of ST.Finally,we propose a deep learning model based on multi-representation for time series classification,namely Multi-Representation Recurrent Neural Net-work(MR-RNN).MR-RNN uses multiple channels to automatically learn latent features from different perspectives w.r.t.different time series representation re-sults generated by different time series representation methods(OPLR-TP,MHR-ARI).MR-RNN could complete the corresponding classification effectively based on the key latent features acquired.We perform the corresponding classifica-tion experiments for MR-RNN based on a large number of time series data sets.The experimental results show that MR-RNN has a more comprehensive feature learning ability and a higher classification accuracy than other baseline methods.At the same time,MR-RNN leverages a parallel attention mechanism to auto-matically identify the important parts from the obtained features to enhance the interpretability of the corresponding classification results.
Keywords/Search Tags:Data Mining, Streaming Time Series, Time Series Feature Repre-sentation, Time Series Classification
PDF Full Text Request
Related items