Font Size: a A A

Clustering raw distributions of intensities from Affymetrix gene expression microarrays in order to evaluate statistical preprocessing methods

Posted on:2011-05-26Degree:Ph.DType:Dissertation
University:Southern Methodist UniversityCandidate:Zou, KunFull Text:PDF
GTID:1448390002960058Subject:Biology
Abstract/Summary:
Gene Expression microarrays have been used for approximately ten years to elucidate the genetic mechanisms behind common disease and biological processes. There is a large volume of research on the analysis of such data. Typically, the data must be preprocessed prior to statistical testing due to the prevalence of non-biological noise. This research focuses on clustering raw distributions of intensities from Affymetrix gene expression microarrays in order to determine properties of the data that affect performance of statistical processing methods. The novelty of this research lies in that, firstly, information is gathered on the distribution of the intensities of a variety of Affymetrix gene expression microarray data sets. A new clustering method, based on a visual definition of shape of the distributions is developed to cluster the array experiments based on their statistical distributions. Next, various preprocessing pipelines are applied to each data set within each distribution cluster to determine whether there is a one-to-one correspondence between distribution cluster and performance of the various pipelines. The area under receiver operating characteristic curve (AUC) is applied for determination of performance for spike-in data sets. Significant Gene Ontology (GO) and Intra-class Correlation Coefficient (ICC) are used to evaluate performance for real gene expression data sets. Subsequently, we study whether there is a best preprocessing pipeline that can be generally applied to all data sets assigned to a certain class that generates the most biologically meaningful result. v...
Keywords/Search Tags:Gene, Expression microarrays, Data sets, Statistical, Distributions, Preprocessing, Cluster, Intensities
Related items