Font Size: a A A

A Bayesian approach to temporal data clustering using the hidden Markov model methodology

Posted on:2001-01-28Degree:Ph.DType:Thesis
University:Vanderbilt UniversityCandidate:Li, CenFull Text:PDF
GTID:2468390014958096Subject:Computer Science
Abstract/Summary:
Most real world systems are dynamic in nature. A typical approach to understanding the behavior of dynamic systems and the complex phenomena associated with them is to build and analyze models of the system behavior. In some well studied domains, for example engineering systems and speech recognition applications, there exists sufficient knowledge to construct detailed models of significant phenomena. But in many other domains, such as economics, the stock market, and medical applications, this type of knowledge may not be available, or is incomplete. In these cases, data driven approaches, such as the temporal data clustering and modeling methodology that we have developed in this thesis, provide a feasible alternative for building models to interpret and analyze dynamic system behavior. The goal of cluster analysis is to first construct homogeneous groups of data objects, and then to build models that characterize each group.; Our temporal data clustering and modeling approach employs the hidden Markov model (HMM) methodology. A HMM defines the set of states required to characterize a dynamic process as a non-deterministic (probabilistic) finite state automata. This provides a well understood framework for modeling and analysis of dynamic system behavior. Past work on clustering with HMMs assumed that the model structure is predefined, and the learning task only involves estimating the model parameters, i.e., the transition and emission probabilities. This limited the clustering's application to domains where sufficient knowledge of dynamic phenomena was already available. However, the approach presented in this thesis directly induces the HMM model sizes from data, making it applicable to a much wider range of problems.; This dissertation develops a complete algorithm for temporal data clustering using the HMM methodology, where partition structure and HMM model size selection are performed as part of the clustering process. We have cast the problem of HMM model size selection in the Bayesian model selection framework. In addition, we have adopted the Bayesian finite mixture clustering framework for the HMM-based clustering problem. We empirically study the characteristics of the Bayesian model selection criteria as they are applied to the model size selection and partition model selection problems. A very important problem when dealing with real data, the data insufficiency problem, is systematically studied. Its immediate effect is the generation of local maxima in the ML model parameter estimation method, and we show how this affects the quality of the HMM model size selection and partition model selection tasks.; We formalize our clustering approach as four nested search loops. We analyze the computational complexity of a brute force 4-loop approach to clustering temporal data. To overcome the high computational complexity of the brute force approach, we introduce heuristics in the two primary model selection steps. We also propose a more effective HMM model initialization method to speed up and to improve the quality of the HMM model parameter estimation process. This results in an efficient four step search control structure, that exploits the characteristics of the criterion functions used for model selection at the partition and cluster levels. Experimental results show that this approach is effective in finding the optimal partition structure for the data.
Keywords/Search Tags:Approach, Data, Model, Bayesian, Dynamic, Partition, Methodology, Structure
Related items