Font Size: a A A

Bayesian model-based clustering of short time series

Posted on:2009-03-17Degree:Ph.DType:Dissertation
University:Boston UniversityCandidate:Wang, LingFull Text:PDF
GTID:1448390002494296Subject:Biology
Abstract/Summary:
Cluster analysis is a common technique that can be used in many different fields, such as pattern recognition, machine learning and bioinformatics. Many of the existing clustering methods are designed to cluster independent observations, which are not time series data. Methods specific to cluster time series use autoregressive models to describe the pattern of the time series, but they have poor performance when the time series are short due to stationarity assumptions of autoregressive models. Here I propose a Bayesian model-based approach for clustering short time series data. This method uses polynomial models to describe the underlying temporal profiles and uses a Bayesian model selection approach for clustering: two profiles belong to the same cluster when they are generated from the same model. I first compare and review existing clustering methods, especially model-based clustering methods, then develop my approach to cluster gene expression data from short microarray experiments. I use the model formulation to derive a Bayesian score for clustering. I evaluate the performance of this method using simulated data, and apply this method to a real microarray experiment studying immune response to Helicobacter pylori infection. This proposed method performs better than another program STEM, which is also designed for clustering short temporal data, using the simulated and real data. Because time course microarray experiments are often performed across several different experimental conditions, I extend this method to cluster this type of data taking the experimental conditions into account. I further use an iterative procedure to search for the set of genes that have similar expression patterns across experimental conditions, and the set of genes that behave in a unique fashion under specific experimental condition. I evaluate the performance of this extended clustering method using simulated data and apply this technique to the gene expression data from an experiment studying various stimuli to T-cell activation.
Keywords/Search Tags:Time series, Cluster, Data, Short, Bayesian, Model-based
Related items