Font Size: a A A

Molecular Network Study Based On Gene Expression

Posted on:2017-11-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WuFull Text:PDF
GTID:1360330590990802Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
An organism is an extremely complex system which controls its heredity,growth and development throughout life.The difference between different organisms is mainly caused by the difference between their whole set of genetic materials,referred to as genome.The number and the class of the genes in different cells of an organism are the same,while the expression of genes has some specificities,such as tissue specificity,cell cycle specificity and external signal response specificity.These specificities cause the expression patterns of genes in different tissue cells or the same tissue cell under different condition to be very different.The gene expression data contains a large amount of activity information of genes,which can reflect the physiological status of the cell,such as whether the cell is in the normal(disease)state.With the rapid development of the next-generation sequencing technology and the introduction of various effective analysis methods,many researchers focus on the analysis of large scale gene expression data.The analysis of gene expression data can not only help us to deepen the understanding of life,but also mine significant biological knowledge,the implied expression patterns and regulatory mechanisms.Such information greatly promotes our current understanding,diagnosis and therapy of diseases.In this dissertation,we analyze the gene expression data from three aspects.That is,we will carry out gene differential expression analysis,gene co-expression network analysis and gene regulatory network inference.The main contributions of this dissertation mainly consist of the following parts.1.Gene differential expression analysis based on Poisson log-normal distribution.We utilize the Poisson log-normal distribution to model the next-generation sequencing data.Poisson log-normal distribution can not only describe the over-dispersion character of the data,but also provide a better fit to both the low and high expression level data.However,the Poisson log-normal distribution function cannot be expressed as in an analytic form.To overcome this disadvantage,we apply a gene subset selection strategy to reduce the error of the analytical approximation.The simulation results show that the strategy greatly improves the precision of the variance estimation.Additionally,we propose an mean-log method to estimate the expectation of gene expression levels among all the samples under one condition and this method can reduce the computational complexity under precondition of assuring precision requirement.Furthermore,we compare our method with the commonly used methods and the results indicate that our method performs better than the other methods in terms of discrimination ability and results in a better tradeoff between the recall rate and the precision.2.Gene co-expression network analysis with gastric cancer data.The gene differential expression analysis usually treats the genes individually,while the genes are not entirely independent.The gene co-expression network is one of the powerful tools to analyze the dependence between the genes.With the gastric cancer data,we use the weighted gene co-expression analysis algorithm to construct the normal associated gene co-expression network and tumor associated gene co-expression.After comparison and identification,we find several genes and modules that are closely associated with the gastric carcinogenesis.3.Dynamic characteristics analysis of the gene co-expression network with the gastric cancer data.As the carcinogenesis is a complex process involving gradual accumulation and interaction of genetic mutations,the expression patterns of genes in different tumor stages are also different.To further understand the gastric cancer,we divide the gastric cancer data into five phenotypes,Normal,Stage I,Stage II,Stage III and Stage IV,according to the clinic data,and then we construct a gene co-expression network for each phenotype.Through analyzing the dynamic change among these networks,we find some specific network features.For example,the connectivity of genes in the four tumor associated networks is significantly smaller than the connectivity of genes in the normal associated network.Additionally,according to the dynamic change of gene connectivity across all the five phenotypes,we cluster the genes with the Kmeans algorithm and find three classes of genes that are closely associated with the different gastric cancer stages.4.Gene regulatory network inference with a multi-level strategy.The gene co-expression network reflects the dependence between the genes,while it does not reflect the regulatory relationships between the genes.We propose a gene regulatory network inference algorithm with a multi-level strategy.We first obtain the raw inferred gene regulatory network with the guided regularized random forest algorithm.Then we utilize the q-norm based method to normalize the result.At last,we refine the result based on the assumption of the sparsity of gene regulatory network.To establish the accuracy and robustness of our method,we compare our method with the state-of-the-art methods on the benchmark networks provided by the DREAM projects.The comparison results indicate that the proposed method outperforms the other methods on most benchmark networks and the multi-level strategy significantly improves the performance of gene regulatory network inference.
Keywords/Search Tags:Gene expression, differential expression analysis, Poisson log-normal distribution, gene co-expression network, dynamic analysis, gastric cancer, gene regulatory network inference, multi-level strategy
PDF Full Text Request
Related items