Multivariate time series exists widely in each application domain, such as the area of astronomy, finance, meteorological and so on. Because the multivariate time series include multiple variables, so we can get more actual result by research them. Thus the data mining of multivariate time series is more helpful for people to make correct analysis and decisions. So the study of multivariate time series data mining gets more and more attention.The tasks of multivariate time series data mining mainly include similarity search, classification, clustering, prediction, rule discovery and so on. The similarity search is the prerequisite and basis for other time series mining tasks, so it’s very important to research the similarity search of multivariate time series.Through reading and research a lot of literature in the field of time series data mining and similarity search at home and abroad. This thesis systematic study and summarize the development of the current state of the multivariate time series similarity search technology. On the premise of technological development framework, at various stages of similarity search in the typical algorithms are described in detail and analysis, pointed out the problems, and put forward the corresponding solutions on this basis. The main research work includes:(1)This thesis proposes a similarity search algorithm based on EMD for multivariate time series.First, using the empirical mode decomposition method to extract the trend of each dimension, and we can get the trend plane of the multivariate time series. Second, segmenting the trends sequence by using bottom-up approach and transforming these segments into a sequence of characters constituted by{-1,0,1}. Finally, comparing two multivariate time sequence by counting the similar segments of the sequence of characters.(2)This thesis introduced hierarchical thought to make the whole execution process of similarity search algorithm. First, the rough matching based on Lower Bounding Distance is performed to select candidate sequences. Second, extracting the trend of these candidate sequences by using EMD method. Thirdly, segmenting the trends sequence by using bottom-up approach, transforming these segments into a sequence of characters constituted by {-1,0,1} and making the second matching for the sequence of characters. Finally, comparing two multivariate time sequence by using Euclidean distance. |