Font Size: a A A

Research On High Dimensional Time Series Representation And Classification Algorithm

Posted on:2022-05-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:1480306560489504Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Time series classification is a research direction with a wide range of practical application backgrounds.The collection process of time series data is usually affected by multiple factors,such as equipment,sampling technology,sampling environment,etc.As a kind of vector data,time series has the characteristics of no clear features,high dimensionality,complex and diverse intra-class variation,etc.Hence,as the premise and foundation of time series classification research,it usually needs to reconstruct the original data.On the one hand,data reconstruction can reduce the amount of data that the algorithm needs to deal with to improve the training efficiency of the model.On the other hand,it hopes to improve the classification accuracy by more fully and clearly expressing the essential information contained in the original time series data.However,there are three problems in current research on time series data representation and classification algorithms.Firstly,most feature learning models are based on training data,ignoring the feature information contained in the test instances,which leads to weak generalization performance and interpretability of features.Secondly,the research of representation model for data reconstruction faces many challenges.For example,feature prototype definition,feature generation,feature selection,feature correlation analysis,data transformation,etc.Finally,the research on the design of classification algorithms suitable for feature prototypes and reconstructed data is lagging behind.To solve the above problems,this paper conducts in-depth research on the representation model and classification algorithm of time series based on feature prototypes for different types of time series data.The main contributions of this paper are as follows:(1)A lazy classification algorithm based on shapelet is proposed.Firstly,in view of the large scale of candidate shapelet set in the current classification algorithms based on time series shapelet,weak pertinence,and ignoring the local feature information of the test instances,a candidate shapelet selection strategy based on the subsequence space of the instance to be classified is proposed.Then,to improve the efficiency and quality of shapelet search,the concept of shapelet evaluation dataset and the corresponding learning algorithm are proposed.For each instance to be classified,a specific discriminative feature evaluation data set is established and used for optimal shapelets search.Finally,a lazy classification algorithm combining global similarity and local similarity is designed.In addition,to analyze the feature distribution and frequency information of data,the concept of shapelet coverage score is proposed to measure the discriminative strength of each time stamp.Experimental results show that the proposed algorithm has high accuracy and strong interpretability.(2)A bag of shapelet representation model based on random projection technique is proposed.Unlike the traditional time series transformation model based on top-k shapelets,a fast shapelet dictionary learning algorithm based on random projection technique is proposed to build a feature set that can more comprehensively reflect the local feature information of complex and changeable time series.Then,instead of the simple transformation model based on the minimum distance between shapelet and time series,a bag of shapelet representation model based on the shapelet dictionary is proposed.In the process of data transformation,this representation model not only considers the local matching degree between shapelet and complete time series,but also considers the frequency information of shapelet.Experimental results show that,compared with the traditional shapelet-based transformation methods and many benchmark classification models,the data reconstructed based on the proposed representation model can achieve better classification performance.(3)A representation model of time series based on Symbolic Fourier Approximation(SFA)is proposed.Firstly,aiming at solving the problem that the current feature generation technology based on SFA cannot dynamically set the optimal number of Fourier values for the sliding windows with different lengths,a variable-length word extraction method is proposed to learn the optimal word length for each sliding window.Secondly,a new feature discriminative evaluation statistic is designed based on tf-idf.Finally,according to the discriminative difference of the generated features under different resolutions,a discriminative feature dictionary construction algorithm based on the dynamic threshold is proposed,and the corresponding representation model based on discriminative SFA features is given.The experimental results show that the logistic regression model can achieve excellent classification results on time series data reconstructed based on tf-idf discriminative features.(4)A multi-resolution ensemble classification algorithm based on SFA is proposed.Firstly,to reduce the amount of computation in the process of word length learning based on SFA technology,a fast word length learning algorithm based on the discriminative change trend of the Fourier value is proposed.Secondly,to study the correlation between generated words,a symbiotic word generation model based on skip-bigram is put forward.Finally,a multi-resolution integrated classification mechanism is designed to solve the problem of dimension disaster of reconstructed data in symbolic representation of time series based on SFA and sliding window mechanism.Moreover,compared with the classification algorithms based on various theories,the proposed algorithm has excellent performance.In conclusion,to improve the data quality and enhance the interpretability of the model,the effectiveness of the proposed representation methods and classification algorithms in data feature information mining and classification accuracy improvement is demonstrated.These results lay a good foundation for the practical application and indepth research of high-dimensional complex time series.
Keywords/Search Tags:Time Series Classification, Lazy Learning, Bag of Pattern, Shapelet, Symbolic Fourier Approximation, Dictionary Learning, Skip-bigram Model, Ensemble Classification
PDF Full Text Request
Related items