Font Size: a A A

A Survey Of Hidden Markov Model Based Clustering Algorithm On Time Series

Posted on:2012-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:S T YaoFull Text:PDF
GTID:2120330338499470Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Hidden Markov Model(HMM) is a statistical model which assumes the observa-tion sequence is generated by a Markov process with hidden states. Hidden MarkovModel is widely applied to pattern recognition of speech, handwriting, trajectories andbioinformatics. There are two remarkable advantages of HMM based clustering algo-rithms: 1)They are suitable for handling time series data with unequal length or evenmissing value. 2)They are capable of extracting the hidden property(Markov prop-erty) underlying the time series to improve the clustering accuracy. Over the years,investigations on HMM based clustering algorithms largely focus on practical appli-cations, while the accuracy and robustness of the algorithms remain less explored. Inthis thesis, we present a statistical model based clustering algorithm which constructsthe similarity matrix by calculating the Kullback-Leibler Divergence(KLD) betweenposteriori distributions of time series pairs under different model parameters, spectralclustering is then applied to the mapped data.Unlike most existing HMM based clustering algorithms, the approach we pro-posed adopts KLD to evaluate the similarity between time series. KLD integratesover the whole probability space, utilizing more information from the statistical mod-el. Experimental results on both synthetic and real data shows the proposed algorithmachieves higher accuracy compared to its peers with different distance measures un-der the same condition. Besides, spectral clustering algorithm effectively reduced thenoise from the data by eigenvector decomposition, which is more robust to handle thesynthetic data with intrinsic or extrinsic noise compared to other algorithms such asK-Means. The main contribution of this thesis are as follows:(1)We study how distance measure function and eigenvector decomposition affect theclustering accuracy. Previous HMM based clustering algorithms mostly adopt mutual fitness score or BP metric to calculate the distance between time series. Although thesemeasures are well interpreted in some sense, they merely take advantage of informa-tion between a specific pair of time series, instead of considering the global probabilityspace. The locality of this measure will lead to a decrease in clustering accuracy. Inaddition, when the noise level of time series increases, the accuracy of conventionalclustering algorithms such as K-Means, hierarchical clustering is going to decreasenotably. The above-mentioned issues are effectively addressed by introducing KLDand spectral clustering.(2)We study how hidden states number affects the clustering accuracy. The numberof hidden states is normally preset in most applications. Despite the fact that the hid-den states are endowed definite meanings in some applications(i.e. hidden states mayrepresent syllables in speech recognition), they more often lack physical meanings.Our survey on clustering accuracy given different hidden states number shown that theaccuracy changes unmonotonously with the hidden states number increasing. And itwill decrease when the model is overfitted. Moreover, the training time augments insquare magnitude with the state number increasing.(3)We study how to learn an optimal cluster number. Cluster number is usually prede-fined, however in many practical applications, it is impossible to know this number be-forehand. We thus desire to find a criterion to evaluate the quality of clustering results.In this thesis, we adoptαvalue to do evaluation, which not only locally maximizesthe intra-class similarities but also prevents the cluster with very few observations.Experimental results shown that this criterion are capable of finding correct clusternumber.
Keywords/Search Tags:Hidden Markov Model(HMM), Kullback-LeiblerDivergence(KLD), spectral clustering, eigenvector decomposition
PDF Full Text Request
Related items