Font Size: a A A

Topics in the analysis of sparse and irregularly spaced time dependent gene expression data

Posted on:2011-05-17Degree:Ph.DType:Dissertation
University:Columbia UniversityCandidate:Sinha, AnshuFull Text:PDF
GTID:1448390002456157Subject:Biology
Abstract/Summary:
Time-dependent microarray profiles examine the expression of genes over a time domain, with the goal of understanding the dynamic characteristics of biological systems. However, these data sets are typically sparse, both in number of time points and replicates, with irregularly-spaced time points, presenting hurdles for their successful analysis. While literature has offered potential solutions, there is no standard accepted analysis methodology, evaluation method, or platform.;This study contributes to the field of gene expression analysis by: 1) providing a non- parametric clustering approach for sparse data to better uncover biological process, 2) developing methods to determine significantly expressed genes over time, 3) producing a platform for the end-to-end analysis of time course data, and 4) comparing presented methodology with existing methodology.;With an eye to the potential benefits of tying relevant disciplines together, the central tenet of these methods is the use of biologically relevant features. Features summarize gene expression profiles, incorporate dependence across time, and appropriately describe the data, augmenting the information provided by the curve. The clustering method used a combination of features with standard clustering algorithms to produce clusters with focused biology as compared to existing methodologies (STEM, ASTRO). In the radiation data set, it uncovered relevant biology not found by comparison methods, suggesting novel regulators (KDM5B, HDACs) that could epigenetically regulate gene expression as part of the dynamic cellular response to radiation. Similarly, the significance method used the Area Under the Curve (AUC) with standard significance tests to compare overall expressions in selected time frames. The topic of multiple test comparison methods was reviewed in the context of determining reasonable estimates for the number of truly null hypotheses tested. This method also outperformed existing approaches (EDGE, SAM, maSigPro) in terms of biological information captured. Finally, the Processing Expression of Short Time Series (PESTS) platform was introduced to facilitate researcher needs, implementing the methods developed here with a focus on usability and transparency. It is the only end-to-end platform currently available. By tying together biology, analysis methods, and platform development, this study showed that the meaningful information in gene expression data can be more completely captured.
Keywords/Search Tags:Expression, Time, Data, Methods, Platform, Sparse
Related items