Font Size: a A A

Multivariate time series analysis based on principal component analysis

Posted on:2008-08-29Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Yang, KiyoungFull Text:PDF
GTID:1448390005972261Subject:Computer Science
Abstract/Summary:
Time series is a series of observations over time. When there is one observation at each time instance, it is called a univariate time series (UTS), and when there are more than one observations, it is called a multivariate time series (MTS). While UTS datasets have been extensively explored, MTS datasets have not been broadly investigated. The techniques for UTS datasets, however, cannot be simply extended for MTS datasets, since multivariate time series is different from multiple univariate time series. That is, an MTS item may not be broken into multiple univariate time series and be separately analyzed, because this will result in the loss of the correlation information within the multivariate time series.;In this dissertation, we introduce a set of techniques for multivariate time series analysis based on principal component analysis (PCA). As a similarity measure for MTS datasets, we present Eros (Extended Frobenius norm). Eros computes the similarity between two MTS items by comparing the corresponding principal components and using the variances that the principal components represent as weights.;For efficient retrieval of MTS items using Eros, we introduce an index structure for Eros, termed Muse (Multilevel distance-based index structure for Eros). Given a query item, Muse first utilizes the lower bound of Eros to filter out the MTS items that are not to be in the set of k Nearest Neighbors. Subsequently, Muse refines the MTS items that are not filtered out by employing Eros in order to exactly identify the k Nearest Neighbors of the given query item.;Inherently, an MTS item is very high dimensional. Hence, it is, in general, beneficial to reduce the dimension of the dataset before applying data mining techniques, e.g., classification and clustering, which results in the elimination of irrelevant and/or redundant data. For Eros, we present a feature subset selection technique, termed Ropes ( Recursive Feature Elimination on Common Principal Components for Eros). Ropes utilizes the common principal components and the weights recursively in order to select a subset of features for Eros.;In addition, utilizing the correlation information and Eros, we introduce a set of feature subset selection and feature extraction techniques for multivariate time series datasets, such as Corona (Correlation as Features), CLeVer (descriptive Common principal component Loading based Variable subset selection) and KEros. Corona is a supervised feature subset selection technique, which first represents an MTS item using the correlation coefficients, and recursively eliminates at each time one of the features based on the contribution to the classification decision boundary. CLeVer is an unsupervised feature subset selection technique, which performs the feature subset selection based on the contribution to the common principal components. KEros performs the feature extraction based on the Kernel PCA technique using Eros as the similarity measure between two MTS items.;With the advent of various sensing techniques, there are cases where each data is represented in an n-way array, where n is greater than 2. One of the examples would be the functionalMagnetic Resonance Imaging (fMRI) data, where each data is represented in a 3-way array, and an fMRI stream is represented in a 4-way array. An n-way array may be flattened into a matrix, where, for example, Eros can be applied. However, this flattening may result in the loss of the spatial correlation. In order to address this problem, we extended Eros to these n-way array datasets, termed nEros (n-way Eros). Intuitively, for an n-way array, there are n ways of unfolding it into a matrix. For each fold, we perform Eros, and sum up the n results into one similarity value.;Our experimental evaluation employing various real-world and synthetic datasets shows that the presented techniques based on the correlation information within the MTS items perform better than traditional approaches that do not utilize the correlation information, e.g., Euclidean distance.
Keywords/Search Tags:Time series, MTS items, Principal, Correlation information, Feature subset selection, Eros, N-way array
Related items