Classification and alignment of gene-expression time-series data

Posted on:2010-06-04

Degree:Ph.D

Type:Dissertation

University:The University of Wisconsin - Madison

Candidate:Smith, Adam Allen

Full Text:PDF

GTID:1448390002476615

Subject:Computer Science

Abstract/Summary:

We present methods for comparing and performing similarity queries for gene-expression time-series data. Such data is usually gathered via microarrays or related technologies. In the studies with which we work, the methods are used to compare the gene activity of mice after exposure to different treatments, or with specific genes knocked out. This lets us compare the effects of the treatments or knockout at a molecular level. The data tends to be sparse in time, but it represents measurements for thousands or tens of thousands of separate genes, each of which constitutes a separate dimension. Such data is also subject to technical noise and biological variability.;Our approach involves three key steps. The first step is to reconstruct a continuous time series from the discrete observations. We use B-splines to accomplish this. Unlike previous methods, we relax the fit of the splines so that they are less prone to overfitting the data. We place the points of discontinuity in the spline in such a way that a spline is well-defined over the whole length of the series.;The second step is to align the pairs of time series in order to find a time-by-time correspondence that maximizes the similarity between them. We present two segment-based algorithms that are specially designed to align gene-expression data. We also develop heuristics to speed up the alignment computations, without adversely affecting the quality of the alignments found. Finally, we present an approach for computing clustered alignments, in which the genes are split into a small number of clusters, each of which is aligned independently.;The final step is to score the alignments found, based on the similarity of the two series. This allows us to conduct similarity searches, in which we compare a query of unknown character to series associated with other treatments that have been well-studied. One of our high-level goals is to create a BLAST-like tool, that will allow biologists to enter the gene-expression data from their own studies, and will return treatments that affect gene expression in similar ways.

Keywords/Search Tags:

Data, Gene-expression, Series, Time, Similarity, Treatments

Related items

1	Mining Coregulated Biclusters From Time-series Gene Expression Data
2	Clustering algorithms for time series gene expression in microarray data
3	Microarray time-series data clustering via gene expression profile alignment
4	Time Series Prediction In Stock-Price Index And Stock-Price Based On Gene Expression Programming
5	The Research Of Key Techniques For Function Mining And Time Series Analysis By Gene Expression Programming
6	Stock Investment Decision System Based On Gene Expression Programming
7	Association Rules Mining And Its Applications In Microarray Gene Expression Data
8	Devising effective similarity measures and learning algorithms for the study of metazoan gene expression
9	The Approach To Mining Time-lagged Coregulated Gene And Research On Fuzzy Clustering Algorithm
10	The Application Research Of Time Series Data Mining On Securitties Market Prediction