Font Size: a A A

Automatic phenotype structure mining underlying gene expression profiles

Posted on:2006-05-12Degree:Ph.DType:Dissertation
University:State University of New York at BuffaloCandidate:Tang, ChunFull Text:PDF
GTID:1454390008956820Subject:Computer Science
Abstract/Summary:
Recently introduced DNA microarray technology permits rapid, large-scale screening for patterns of gene expression and gives simultaneous, semi-quantitative readouts on the level of expression of thousands of genes for samples. The raw microarray data (images) can then be transformed into gene expression matrices where usually a row in the matrix represents a genes and a column represents a sample. The numeric value in each cell characterizes the expression level of the particular gene in a particular sample. Microarray technology has a significant impact on the field of bioinformatics, requiring innovative techniques to efficiently and effectively extract, analysis, and visualize these fast growing data.; In this dissertation, we explore the new problem of mining phenotype structure from gene expression data sets and a novel unsupervised analyzing framework to detect phenotypes and informative genes underlying gene expression data sets. We transform the phenotype structure mining problem into an optimization problem. A series of statistical measurements are proposed to measure the quality of the mining results. These measurements delineate local pattern qualities based on a partition of samples on a subset of genes to coordinate between sample phenotype discovery and informative space detection. A phenotype quality function which serves as the object function of the optimization problem is defined based on these statistical measurements. Two interesting unsupervised learning algorithms are developed: the heuristic search and the mutual reinforcing adjustment methods. Iterative pattern adjustment strategies are presented to approach the optimal solution which the pattern quality is maximized. The methods dynamically measure and manipulate the relationship between samples and genes while conducting an iterative adjustment of genes and samples to approximate the informative genes and the phenotypes of the samples simultaneously. We present an extensive performance study on both real-world data sets and synthetic data sets. Our results strongly suggest that the two proposed methods are effective and scalable. The mining results are clearly better than the previous methods. They are ready for the real-world applications. The mutual reinforcing adjustment method is in general more scalable, more effective and with better quality of the mining results. (Abstract shortened by UMI.)...
Keywords/Search Tags:Gene, Mining, Phenotype structure, Data sets, Quality
Related items