Font Size: a A A

Statistical Models For The Analysis Of Gene Networks

Posted on:2012-07-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H ZhangFull Text:PDF
GTID:1100330335962458Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Essentially, all biological functions of a living cell are carried out through theinterplay between many genes and their products. Understanding the interac-tions between genes and their functions in a living system is a main challenge insystems biology, which is a rapidly evolving research area fueled by the recent ad-vances in high-throughput biotechnologies that enable the collection of, large-scalegenomics data on gene expression, protein-protein interactions, genome-wide lo-cation, genetic variations and many other types of data. These data providevaluable system-level information on di?erent aspects of the complex biologicalprocesses and make it possible to view a cell as system and infer the underlyingnetworks. Many computational and statistical methods have been proposed touse these data to dissect transcriptional regulatory networks to understand themechanisms behind di?erent biological processes or behaviors of organisms. De-spite extensive research on this topic, it remains a great challenge to elucidatethe complete network due to the noisy nature of high-throughput data and thecomplexity of the transcription processes.The dissertation focuses on statistical models motivated by real problems insystems biology. Several topics related to gene networks, such as module analysisof gene coexpression networks, gene regulatory networks, and di?erential genenetwork analysis, have been thoroughly investigated based on gene expressiondata. The main contributions of the dissertation are as follows:1. Due to the complexity in detailing gene regulatory networks, many re-searchers have used the gene module concept to simplify the descriptionof gene networks. Although many di?erent methods have been proposedto identify gene modules from various data sources and many useful resultshave been obtained, there is no consensus on the de?nition of modules andthere is a lack of understanding of the biological basis of modules. Here wepresent an analysis of gene modules, based on weighted gene coexpressionnetwork, to explore how these modules are related to the underlying reg-ulation processes. Information from Gene Ontology, annotated pathways,and genome-wide location data is used to interpret the biological meaningof these modules. We found that topological overlap measure is a better way to extract gene modules and modules are biologically meaningful andconsistent, which implies that modules may represent an inherent propertyof the underlying biological processes. Furthermore, using the expressionQuantitative Trait Loci analysis, we were able to identify genomic regionsthat a?ect the expression levels of genes in a module, which partly explainedthe genetic basis of gene modules. The results indicate that modules canfacilitate our analysis of gene expression data and lead to a better under-standing of gene networks.2. Gene regulatory networks play an important role in every process of life.Elucidating these networks could help shed light on the mechanism of celldi?erentiation, metabolism, signal transduction and diseases. Signi?cante?orts have been made to reconstruct gene regulatory networks as a resultof the rapid accumulation of genome-wide gene expression data. However,these methods su?er from relatively low accuracy, due to the complex rela-tionships among a large number of genes involved, and reverse engineeringgene regulatory networks remains a challenging task. Integrative analysisof multiple heterogeneous gene expression datasets may provide an e?ec-tive way to increase the estimation accuracy. We propose a new reverseengineering method in order to e?ciently utilize gene expression data fromvarious perturbation experiments. The key idea is to use di?erent models tohandle heterogeneous data and integrate these multiple sources of informa-tion with Fisher's method. The simulation study shows that expression datafrom gene knock experiments is the most informative data for the networkreconstruction and integrating multiple gene expression data could improvethe accuracy. We applied our method to the DREAM4 in silico networkchallenge to demonstrate its performance and we won the 2nd place in sub-challenge 1.3. Nowadays in the study of microarray gene expression data there has been agreat shift form di?erential expression analysis to di?erential network anal-ysis. Gene associations are dynamic and condition-speci?c in nature. Underdi?erent conditions, gene networks exhibit di?erent association patterns.Identifying condition-speci?c gene associations has important biological ap-plications, such as discovering alterations in gene association networks orgene pathways across di?erent biological conditions, to provide insights into the pathophysiology of disease and help identify drug targets. Separate anal-ysis of each single microarray dataset from di?erent conditions may su?erfrom low power to detect the condition-speci?c gene associations, due to thesmall sample size, while pooling multiple heterogeneous datasets togethermay provide an e?ective way of increasing the statistical power. Therefore,we propose a novel hierarchical Bayesian method for detecting condition-speci?c gene associations, which jointly considers heterogeneous microarraygene expression data under varying biological conditions. Our model uses aspike and slab prior to account for the sparsity of gene associations and pro-duces the gene-gene speci?c posterior probability of di?erential association,which is the basis for condition-speci?c inference. We evaluate the perfor-mance of our model in comprehensive fashion with simulation study and realdata analysis. The simulation results from the arti?cial data demonstratethe advantage of the hierarchical Bayesian model. Our model generallyoutperforms other methods when evaluated based on several performancemeasures across a range of various patterns of association alterations. Wealso applied the method on real data from across population comparison a-mong HapMap samples. The results demonstrate that the proposed methodis powerful in terms of identifying the meaningful condition-speci?c gene as-sociations.The study was supported in part by a fellowship from the China ScholarshipCouncil (CSC:2008634012) and NIH grant GM59507.
Keywords/Search Tags:Systems biology, Gene coexpression network, Gene regulatory net-work, Gene association network, Differential network analysis, Hierarchical Bayesianmodel, Gaussian graphical model, Gene module
PDF Full Text Request
Related items