Font Size: a A A

Statistical methods for analysis of graph-constrained genomic data

Posted on:2010-09-12Degree:Ph.DType:Dissertation
University:University of PennsylvaniaCandidate:Li, CaiyanFull Text:PDF
GTID:1448390002987389Subject:Biology
Abstract/Summary:
Graphs and networks are common ways of depicting information. In biology, many different biological processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This kind of prior information accumulated over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene expression data. How to incorporate information encoded by known biological pathways into the analysis of numerical data raises interesting statistical challenges. This dissertation develops several statistical methods for analysis of genomic data by incorporating the prior biological network information. We consider the high-dimensional regression problem when the covariates are measured on undirected graphs and develop methods for identifying genes and sub-networks that are related to the phenotypes. Specifically, we present the problem formulation, efficient computational algorithm of our procedure - GRAph-Constrained Estimator (GRACE) and develop theoretical properties of GRACE, including non-asymptotic error bounds and sign consistency for both fixed and diverging number of parameters. We also introduce an empirical Bayes method to take into account the biological network structure information using a discrete Markov Random Field model prior for identifying genes and subnetworks whose transcription activities are perturbed by or activated in response to experimental conditions. We apply both GRACE and the empirical Bayes method to a microarray gene expression study of human brain aging to identify genes or subnetworks that are related or perturbed by the human brain aging. Extensions of the proposed methods to censored survival data are also presented.
Keywords/Search Tags:Data, Methods, Information, Statistical, Genomic, Biological
Related items