Font Size: a A A

Study Of Symbolic Aggregate ApproXimation For Time Series Classification

Posted on:2019-03-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:W SongFull Text:PDF
GTID:1368330545962410Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Feature representation and measurement is the foundation of data mining,analysis and application,the original time series can be mapped into a low dimensional feature space,by this way to achieve the data dimensionality reduction,noise removal and reduce the computation cost,to facilitate the task of data mining and knowledge discovery.Arising needs on various time series data mining tasks inspire a number of representation methods.Among the representation approaches,Symbolic Aggregate approXimation(SAX)method has become one of the de facto standard to discretize time series and has been widely used.However,although it has been proved to be an efficient method,1)SAX method is still difficult to avoid the loss of effective information,thus affecting the accuracy of data mining and analysis.2)At the same time,the application of SAX is mainly reflected in the single variable time series processing,and there is a lack of research on multivariate time series.3)In addition,although there are so many practical application of this method,however,the analysis of its intrinsic properties,such as complexity,loss of information,correlation and periodicity are rare.4)There is a lack of visualization research on SAX.It is in this context,through a series of framework,model and statistical measurement,combined with data classification,we studied the SAX method deeply.The contributions and innovations of this thesis are as follow:1)We proposed the Multi-phased approach framework by use ensemble learning algorithm of voting and AdaBoost to remedy the information loss by SAX representation.2)We proposed a new model named CNMMRDV(Convolutional Network Model for MTS Representation based on Deconvolutional Verification)to learn the representation for multivariate time series.Deconvolutional networks fully exploit the advantage the powerful expressiveness of deep neural networks in the manner of unsupervised learning.We design a network structure specifically to capture the cross-channel correlation with deconvolution,discretization based on SAX is applied on the feature vectors to further extract the bag of features.3)We applied several statistical measurements and proposed a new statistical measurement,i.e.information embedding cost(IEC)to analyze the statistical behaviors of the symbolic dynamics.4)We proposed a new visualization approach for SAX visualization and analysis.We build the Markov transition matrix from the SAX representation to visualize the time series as complex networks.Our experiments on the benchmark datasets of UCR dataset,CMU MTS dataset and the clinical signals demonstrate that:The Multi-phased approach framework can avoid the loss of effective information,CNMMRDV model can capture the cross-channel correlation effectively.We show how this representation and bag of features helps on classification.A full comparison with the sequence distance based approach is provided to demonstrate the effectiveness of our approach on the standard datasets.The IEC score provides a priori to determine if SAX is adequate for specific dataset,which can be generalized to evaluate other symbolic representations.The visualization approach together with IEC score to visually understand,explore classification tasks and the intrinsic properties of SAX.
Keywords/Search Tags:time series classification, symbolic feature representation, symbolic aggregate approximation, convolutional neural network, intrinsic properties measurement, visualization
PDF Full Text Request
Related items