Multivariate time series,as the output value of multiple attributes of the observation system,exists in various fields.It has become a meaningful research topic to mine useful information and find hidden patterns in time series.Similarity analysis,as the basis of time series data mining,is not mature enough in multivariate time series compared with that in univariate time series.Through reading a lot of literature in the field of time series similarity analysis at home and abroad,and in view of the facts that most of current methods can only tolerate one or two transformations,neglect the correlation between variables and need fine-tune parameters,this thesis proposed two similarity measure methods for multivariate time series,which were respectively in consideration of reducing dimensionality and improving the current similarity measure methods of univariate time series.Moreover,on the basis of similarity measures,a clustering method based on K-nearest neighbor network for multivariate time series was proposed.Our main contributions are as follows.First,in consideration of the features of high dimensionality,close variable correlation and different lengths in multivariate time series,this thesis proposed a similarity measure method for multivariate time series based on multiple segmentations,Frobenius norm representation and weighted dynamic time warping algorithm.In order to reduce dimensionality,a hierarchical detection algorithm was firstly applied to find key points in time series,and then time series was segmentationed as a whole through the error-based algorithm.On the basis of segmentations,Frobenius norms of segmentation matrix and its correlation matrix were used to approximate the segmentation matrix,thus the multivariate time series was compressed as a univariate time series.With a view of the different lengths of compressed time series,weighted dynamic time warping algorithm was introduced to measure the similarity between two compressed time series.Second,with an eye to the fact that most of current methods neglect the close correlation between variables and the shape features of time series,this thesis proposed another similarity measure method for multivariate time series,which was based on common principal component analysis and a shape-based improved weighted dynamic time warping algorithm.To eliminate the variablesâ€™ correlations and transform the current time series objects to the same dimensional space,principal component analysis method was introduced and improved to transform the current multivariate time series to principal component time series which are mutually independent.Whatâ€™s more,the variance devoting rate of each principal component served as the weight of each series.In order to take the value and shape features of time series altogether into consideration,we improved the current weighted dynamic time warping algorithm based on the shape feature of each point in time series.Third,as a result of that most of current time series clustering algorithm is not satisfactory,it this thesis,we explored the multivariate time series clustering algorithm based on K-nearest neighbor network.At first,a directed-weighted K-nearest neighbor network was built,with multivariate time series objects as nodes,similarity relation between multivariate time series objects measured through the methods proposed before as edges,and similarity values as the weight of edges.Based on the network model,the BGLL algorithm,a hierarchical community structure partitioning algorithm,was introduced into the clustering of time series objects in network.To verify the similarity measure and clustering methods for multivariate time series proposed in this thesis,six datasets from UCI were used for the experiments of similarity search,classification and clustering.The results showed that the similarity measure methods proposed in this thesis can acquire better performance than most of the currents methods in time series search and 1NN clustering experiments.At the same time,the K-nearest neighbor network clustering algorithm proposed in fifth section is effective and feasible in multiple time series clustering. |