Approaches to clustering gene expression time course data

Posted on:2007-11-13

Degree:M.S

Type:Thesis

University:State University of New York at Buffalo

Candidate:Krishnamurthy, Praveen

Full Text:PDF

GTID:2458390005989051

Subject:Biology

Abstract/Summary:

Conventional techniques to cluster gene expression time course data have either ignored the time aspect, by treating time points as independent, or have used parametric models where the model complexity has to be fixed beforehand. In this thesis, we have applied a non-parametric version of the traditional hidden Markov model (HMM), called the hierarchical Dirichlet process - hidden Markov model (HDP-HMM), to the task of clustering gene expression time course data. The HDP-HMM is an instantiation of an HMM in the hierarchical Dirichlet process (HDP) framework of Teh et al. (2004), in which we place a non-parametric prior on the number of hidden states of an HMM that allows for a countably infinite number of hidden states, and hence overcomes the issue of fixing model complexity. At the same time, by having a Dirichlet process in a hierarchical framework we let the same countably infinite set of "next states" in the Markov chain of the HMM be shared without constraining the flexible architecture of the model. We describe the algorithm in detail and compare the results obtained by our method with those obtained from traditional methods on two popular datasets - Iyer et al. (1999) and Cho et al. (1998). We show that a nonparametric hierarchical model such as ours can solve complex clustering tasks effectively without having to fix the model complexity beforehand and at the same time avoids overfitting.

Keywords/Search Tags:

Gene expression time course, Clustering, Model, HMM

Related items

1	The Approach To Mining Time-lagged Coregulated Gene And Research On Fuzzy Clustering Algorithm
2	The Research And Application On Gene Expression By Clustering Algorithms
3	Research On Clustering Methods For Analyzing Overlapping Local Gene Expression Patterns
4	The Research And Implementation On Clustering Algorithm Of Gene Expression Data
5	Microarray time-series data clustering via gene expression profile alignment
6	Clustering algorithms for time series gene expression in microarray data
7	The Design And Analysis Of Clustering Algorithms On Gene Expression Data
8	The Research On Clustering Algorithm Applied To Gene Expression Data
9	Study Of Gene Expression Data Analysis Based On Pattern Recognition Methods
10	Research Of Co-clustering Algorithms For Cancer Subtypes Discovery Based On Gene Expression Data