Model-based Time Series Data Mining

Posted on:2009-01-16

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J J Duan

Full Text:PDF

GTID:1118360272958835

Subject:Computer software and theory

Abstract/Summary:

In recent years, there has been an explosion of interest in mining time series data. Model-based approach is one of the most promising approaches for time series data mining, since it can reveal the hidden characteristic of time series. Hidden Markov model (HMMs) is a main time series model. The paper focuses on the algorithms in HMM-based time series mining, while considering the requirement of data stream application. Main researches are as follows, identifying the time series, determining the number of hidden states of HMMs and initialization, clustering time series based on HMMs, predicting stock price time series based on HMMs. Main contributions are listed as follows.(1) Determining the hidden states number of HMMs and model initializationThere exists several shortcomings in tradition HMM selection method BIC(Bayesian Information Criterion), such as the high complexity raised from the excessive candidate models. A model selection method CBIC (Clustering and BIC) is proposed to overcome the shortcomings. The proposed method alternates the way in selecting the candidate models, avoids the excessive model training processes, and thus reduces the algorithm complexity. The method is based on clustering, in which the number of clusters is considered as the number of hidden states. Trend degree is defined as a criterion to select candidate model set. To improve the performance of clustering, the mixture of Gaussian distribution detection algorithm based on kurtosis computation is proposed. It is proved that the kurtosis of non-overlap Gaussian mixture equals 3, while the kurtosis of overlap Gaussian mixture is not 3, in which each component are different. The effectiveness of CBIC is compared to the BIC, and experiment results based on synthesized data and real world data show that CBIC is of less computation complexity and higher precision for time series representation.(2) Clustering time series based on HMMsModel-based clustering algorithm that combines partitional and hierarchical model based methods is popular, such as Hier-k-HMMs, Hier-moHMMs method. However, this combination method requires supplying the number of initial partition clusters, the hidden states number of HMM and final clusters in priori. Moreover, the initialization of HMM and partition has great effect in clustering quality. In this paper, a novel clustering algorithm HBHCTS (HMM-Based Hierarchical Clustering Time-Series) is proposed. The initial partition is generated by a distance threshold, which can be effectively determined based on basic HMM probability. The advantages of HBHCTS are as follows. 1) The number of initial partition clusters is not specified a priori. 2) The number of hidden states of HMM is not specified a priori. 3) The representation of clusters is explained easily. 4) The algorithm is not sensitive to sequence length. 5) The incremental clustering can be easily done and is adaptive to stream process. The experiment result shows that the proposed approach can achieve better performance in correctness rate than the traditional HMM-based clustering algorithm.(3) Adaptively predicting stock price time seriesAn adaptively predicting algorithm PAAMS (Prediction Algorithm based on Adaptive Model Selection) is proposed. By investigating on the short term characteristic of stock return rate series, it can be verified that the return rate series can be described by HMM, although the original stock price series is not suitable to be fitted with HMM. In PAAMS, the time series are transformed by two ways, and the fitted HMM is dynamically updated when the prediction mean error increases to a predefined threshold. During the update process, the model selection method CBIC is applied to get the best hidden state number and other model parameters. The feasibility and effectiveness of proposed prediction algorithm are explained. Experiments on IBM, Dell and Apple stock price data set are done and the results show that the precision of PAAMS algorithm is better than that of previous study on the same data sets based on fixed model techniques, such as HMM fusion model-based method.

Keywords/Search Tags:

Time Series Data Mining, Model Selection, Hidden Markov Model, BIC, Time Series Clustering, Time Series Predicting

Related items

1	Study Of Tunnei Data Based On Time Series Predicting
2	Study On Water Quality Time Series Data Mining And Application Integration
3	Time Series Data Mining Technology And Its Applied Research In The Prediction Of Water Quality
4	Time Series Anomaly Detection Method And Application Based On Autoencoder And HMM
5	Algorithm Study On Short Time Series Mining
6	EMD And BoF Models Based Time Series Data Mining And Applications
7	Research On The Application Of Time Series Clustering Model In Finance
8	Research And Implementation Of Multivariate Time-series Prediction Model
9	Multi-demension Time Series Modeling And Forcasting Analysis
10	Research On Data Mining And Forecasting Methods Over Time Series Data With Complex Structure