Font Size: a A A

Incorporating biological knowledge of genes into microarray data analysis

Posted on:2010-12-28Degree:Ph.DType:Dissertation
University:University of MinnesotaCandidate:Tai, FengFull Text:PDF
GTID:1444390002972235Subject:Statistics
Abstract/Summary:
Microarray data analysis has become one of the most active research areas in bioinformatics in the past twenty years. An important application of microarray technology is to reveal relationships between gene expression profiles and various clinical phenotypes. A major characteristic in microarray data analysis is the so called "large p, small n" problem, which makes it difficult for parameter estimation. Most of the traditional statistical methods developed in this area target to overcome this difficulty. The most popular technique is to utilize an L1 norm penalty to introduce sparsity into the model. However, most of those traditional statistical methods for microarray data analysis treat all genes equally, as for usual covariates. Recent development in gene functional studies have revealed complicated relationships among genes from biological perspectives. Genes can be categorized into biological functional groups or pathways. Such biological knowledge of genes along with microarray gene expression profiles provides us the information of relationships not only between gene and clinical outcomes but also among the genes. Utilizing such information could potentially improve the predictive power and gene selection. The importance of incorporating biological knowledge into analysis has been increasingly recognized in recent years and several new methods have been developed. In our study, we focus on incorporating biological information, such as the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, into microarray data analysis for the purpose of prediction. Our first method aims implement this idea by specifying different L1 penalty terms for different gene functional groups. Our second method models a covariance matrix for the genes by assuming stronger within-group correlations and weaker between-group correlations. The third method models spatial correlations among the genes over a gene network in a Bayesian framework.
Keywords/Search Tags:Microarray data analysis, Genes, Incorporating biological, Biological knowledge
Related items