Font Size: a A A

Classification and alignment of gene-expression time-series data

Posted on:2010-06-04Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Smith, Adam AllenFull Text:PDF
GTID:1448390002476615Subject:Computer Science
Abstract/Summary:
We present methods for comparing and performing similarity queries for gene-expression time-series data. Such data is usually gathered via microarrays or related technologies. In the studies with which we work, the methods are used to compare the gene activity of mice after exposure to different treatments, or with specific genes knocked out. This lets us compare the effects of the treatments or knockout at a molecular level. The data tends to be sparse in time, but it represents measurements for thousands or tens of thousands of separate genes, each of which constitutes a separate dimension. Such data is also subject to technical noise and biological variability.;Our approach involves three key steps. The first step is to reconstruct a continuous time series from the discrete observations. We use B-splines to accomplish this. Unlike previous methods, we relax the fit of the splines so that they are less prone to overfitting the data. We place the points of discontinuity in the spline in such a way that a spline is well-defined over the whole length of the series.;The second step is to align the pairs of time series in order to find a time-by-time correspondence that maximizes the similarity between them. We present two segment-based algorithms that are specially designed to align gene-expression data. We also develop heuristics to speed up the alignment computations, without adversely affecting the quality of the alignments found. Finally, we present an approach for computing clustered alignments, in which the genes are split into a small number of clusters, each of which is aligned independently.;The final step is to score the alignments found, based on the similarity of the two series. This allows us to conduct similarity searches, in which we compare a query of unknown character to series associated with other treatments that have been well-studied. One of our high-level goals is to create a BLAST-like tool, that will allow biologists to enter the gene-expression data from their own studies, and will return treatments that affect gene expression in similar ways.
Keywords/Search Tags:Data, Gene-expression, Series, Time, Similarity, Treatments
Related items