Font Size: a A A

Research On Dimensionality Reduction And Prediction Methods In Time Series Data Ming

Posted on:2015-05-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q SunFull Text:PDF
GTID:1228330434966124Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Time series data is a very common data form and widely exists in various kinds of practical applications. Accordingly, it is becoming a popular research to find the information and knowledge from time series using data mining technology. The research achievements have been successfully applied to various fields such as finance, industry, agriculture, medicine, meteorology, traffic, computer network. However, time series is different with traditional static data, which has the characteristic of time order, large volume, high dimensionality and multi-features. Therefore, it is very import to make research on how to process and analyse time series data effectively through the time series data mining technology.This dissertation focuses on the time series data. For the high dimensionality of time series, we mainly research on time series dimensionality reduction technology, including feature selection method and feature representation method. For the application of time series, we mainly research on time series prediction, including single variable time series prediction and multivariate time series prediction. We perform experiments to evaluate the effectiveness of the above research and methods.Time series feature selection is a dimensionality reduction technology by choosing fewer feature subset that contains main information of the original sequence. Considering the time order of time series, we propose a time series feature selection method based on causality discovery. This method is a two-dimensional feature selection, which not only chooses feature variables but also computes their effective lagged observations. Meanwhile, the feature subset which contains feature variables and lagged observations discovered by the Granger causality, is helpful for the causal interpretation.Time series feature representation transforms the high-dimensional time series data into low-dimensional representation and reserves the feature information of the original time series. In view of the shortcomings of the traditional symbolic method that only uses the average values to describe the original time series may cause information loss, we propose a time series symbolic aggregate approximation method based on trend distance and form a lower bounding distance measure. We firstly present a trend distance to quantitatively measure the different trends using starting and ending points of the segment, and then incorporate the trend variations into the original representation. This method represents the time series using both average feature and trend feature.Single variable time series prediction utilizes the historical values to predict the future data. The traditional prediction method based on Auto Regressive and Moving Average (ARMA) model cannot update the newest information. We build a real-time self update prediction model by combining the difference equation form and transfer form of the ARMA model. This new predictive model contains the influence of the new observations, so that the prediction accuracy is improved and the computation is reduced.Multivariate time series prediction utilizes the multiple variables to predict the target time series. Our proposed prediction method firstly selects features from multiple variables using causality discovery based method, and then predicts the target time series using the Support Vector Regression. Feature selection removes redundant variables and independent variables, which can reduce the input dimension of Support Vector Regression and improve prediction accuracy.
Keywords/Search Tags:time series, data mining, dimensionality reduction, feature selection, causal relationship, feature representation, trend distance, prediction
PDF Full Text Request
Related items