Time series data exist widely in all areas of social life.Effectively reducing the dimension,mining and using valuable information of increasing,complicated and continuous time series,has become a hot research issue in the industry and academia.The time series symbolic representation method has been widely studied and applied in the field of time series data mining due to its good dimensionality reduction performance,simplicity and efficiency,and robustness to noise.Although the current time series classification method based on symbolic representation has achieved fruitful research results,there are still some deficiencies in reducing the number of dimensions,effectively extracting and using data features.Therefore,research on improving the classification performance of time series symbolic classification algorithms has become a research hotspot in the field of time series data mining.This paper has conducted in-depth research on dimension reduction and feature extraction and representation in univariate time series and multivariate time series symbolic representation classification,in order to effectively reduce the computational complexity and classification performance of the time series classification algorithm based on symbolic representation.The main research contents are as follows:(1)Aiming at most of the existing univariate time series classification methods based on symbolic representations that do not consider the effect of prior knowledge of the category on the classification performance of the algorithm.Time series classification method LDA_SC based on LDA(Linear Discriminant Analysis)symbolic representation was proposed.Firstly,use LDA to map the original high-dimensional time series data to a low-dimensional space and consider maximizing the discrimination between classes;then,use information gain and Multiple Coefficient Binning(MCB)to represent the reduced-dimensional data as a symbolic sequence;Finally,the distance between the symbolic sequences is calculated and classified according to the distance lookup table.The experimental results on 20 data sets verify the validity of LDA_SC classification.(2)Aiming at the existing univariate time series classification method based on symbolic representation that do not consider the influence of the nearest neighbor relationship between samples on symbolic classification in the process of symbolizing time series,therefore,time series classification OLPP_SC based on OLPP(Orthogonal Locality Preserving Projection)symbolic representation was proposed.Firstly,use OLPP to reduce the dimensionality of the original time series data,while clearly retaining the nearest neighbor relationship between the samples;then the information gain and MCB are used to represent the dimensionalized data as symbolic sequence;finally,the distance between the symbolic sequences is calculated and classified according to the distance lookup table.The experimental results on 20 data sets show that the classification accuracy and dimensionality reduction performance of OLPP_SC is significantly better than the existing methods,and has good applicability.(3)In terms of multivariate time series,the existing methods have deficiencies in dimensionality reduction performance,the univariate time series symbolic representation classification method directly applied to multivariate time series symbolic representation classification is difficult,and the influence of the nearest neighbor relationship between samples on symbolic classification was ignored.In this paper,the MTS classification algorithm CSC(Center sequence Symbolic Classification)based on center sequence symbol representation is firstly proposed.The classification effect on 13 MTS datasets is better than existing MTS classification algorithm based on SAX;subsequently,this paper proposes an MTS classification algorithm based on OLPP symbolic representation MOSC(MTS OLPP Symbolic Classification).MOSC first uses the OLPP method to reduce the number of dimensions of each variable of the MTS.Variables are expressed as symbolic sequences using information gain and MCB methods,and mean distance(dmean)is used to classify the MTS samples after the symbolic representation.Experimental results on 13 MTS datasets show that the classification performance of the algorithm is better than the symbolic representation method based on the central sequence and other existing methods. |