Font Size: a A A

Model-based Time Series Data Mining

Posted on:2009-01-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J DuanFull Text:PDF
GTID:1118360272958835Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, there has been an explosion of interest in mining time series data. Model-based approach is one of the most promising approaches for time series data mining, since it can reveal the hidden characteristic of time series. Hidden Markov model (HMMs) is a main time series model. The paper focuses on the algorithms in HMM-based time series mining, while considering the requirement of data stream application. Main researches are as follows, identifying the time series, determining the number of hidden states of HMMs and initialization, clustering time series based on HMMs, predicting stock price time series based on HMMs. Main contributions are listed as follows.(1) Determining the hidden states number of HMMs and model initializationThere exists several shortcomings in tradition HMM selection method BIC(Bayesian Information Criterion), such as the high complexity raised from the excessive candidate models. A model selection method CBIC (Clustering and BIC) is proposed to overcome the shortcomings. The proposed method alternates the way in selecting the candidate models, avoids the excessive model training processes, and thus reduces the algorithm complexity. The method is based on clustering, in which the number of clusters is considered as the number of hidden states. Trend degree is defined as a criterion to select candidate model set. To improve the performance of clustering, the mixture of Gaussian distribution detection algorithm based on kurtosis computation is proposed. It is proved that the kurtosis of non-overlap Gaussian mixture equals 3, while the kurtosis of overlap Gaussian mixture is not 3, in which each component are different. The effectiveness of CBIC is compared to the BIC, and experiment results based on synthesized data and real world data show that CBIC is of less computation complexity and higher precision for time series representation.(2) Clustering time series based on HMMsModel-based clustering algorithm that combines partitional and hierarchical model based methods is popular, such as Hier-k-HMMs, Hier-moHMMs method. However, this combination method requires supplying the number of initial partition clusters, the hidden states number of HMM and final clusters in priori. Moreover, the initialization of HMM and partition has great effect in clustering quality. In this paper, a novel clustering algorithm HBHCTS (HMM-Based Hierarchical Clustering Time-Series) is proposed. The initial partition is generated by a distance threshold, which can be effectively determined based on basic HMM probability. The advantages of HBHCTS are as follows. 1) The number of initial partition clusters is not specified a priori. 2) The number of hidden states of HMM is not specified a priori. 3) The representation of clusters is explained easily. 4) The algorithm is not sensitive to sequence length. 5) The incremental clustering can be easily done and is adaptive to stream process. The experiment result shows that the proposed approach can achieve better performance in correctness rate than the traditional HMM-based clustering algorithm.(3) Adaptively predicting stock price time seriesAn adaptively predicting algorithm PAAMS (Prediction Algorithm based on Adaptive Model Selection) is proposed. By investigating on the short term characteristic of stock return rate series, it can be verified that the return rate series can be described by HMM, although the original stock price series is not suitable to be fitted with HMM. In PAAMS, the time series are transformed by two ways, and the fitted HMM is dynamically updated when the prediction mean error increases to a predefined threshold. During the update process, the model selection method CBIC is applied to get the best hidden state number and other model parameters. The feasibility and effectiveness of proposed prediction algorithm are explained. Experiments on IBM, Dell and Apple stock price data set are done and the results show that the precision of PAAMS algorithm is better than that of previous study on the same data sets based on fixed model techniques, such as HMM fusion model-based method.
Keywords/Search Tags:Time Series Data Mining, Model Selection, Hidden Markov Model, BIC, Time Series Clustering, Time Series Predicting
PDF Full Text Request
Related items