Font Size: a A A

Negatively Correlated Gene Expression Patterns And Their Conservation

Posted on:2016-07-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:X D TuFull Text:PDF
GTID:1220330503952386Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
With advances in DNA microarray technology, it is possible now to detect the expression level of thousands of genes simultaneously. The current development of high-throughput sequencing technologies for RNA(RNA-Seq) provides a more precise measurement of transcript levels at a significantly lower cost.Current popular methods in the analysis of gene expression data include the identification of differentially expressed genes, clustering, two-way clustering and constructing gene regulatory networks. Although these methods can mine the hidden part of the information in gene expression data, but the information from the methods is relatively less. Especially in the diseases especially cancer research, although many advanced methods were used to study these diseases, such as The Cancer Genome Atlas(TCGA) database and so on, but they supply litte helpful for the treatment of cancer。It is the right time to reflect on the direction of the current approaches. In the gene expression data, the expression of most genes are not significant, only a small number of genes are differentially expressed. In these differentially expressed genes, some of genes are up-regulated, and the other genes are down-regulated. Whether the downregulated genes are correlated with the up-regulate genes? Or the activated genes may affect the repressed genes each other? A negative correlation pattern is defined as the two subsets V1 and V2 of a gene set V which each contain one and more genes and show opposite changing tendency over a subset of time points or experimental conditions T, and genes in each subsets have similar expression tendency. If the two subsets V1 and V2 of the gene set V show opposite changing tendency over a subset of time points or experimental conditions T, we assume that V1 and V2 is a negative correlation pattern. Currently the negative correlation pattern is not studied in a systematic way.To study these problems, this paper first attempts to combine the knowledge of bioinformatics and biology, and study negative correlation patterns between the activated and repressed genes from three aspects based on energy conservation point of view. The main contents and results are as follows:(1) The algorithms for identifying negative correlation patterns.①The negatively correlated biclustering algorithm(NCFCA) was designed based on formal concept analysis techniques. Compared NCFCA algorithm with other similar algorithms, NCFCA algorithm outperformed other algorithms in the average balanced rate, average Pearson correlation coefficient and clustering score and can identify more size-balanced negative correlation patterns than other algorithms.②All current algorithms include NCFCA algorithm can not deal with big datasets. Once the size of gene expression data is very large, such as the number of experimental conditions is more than 300; the number of genes is more than 1000 and so on, the time cost of algorithms will significantly increase. To solve this problem, the negatively correlated biclustering algorithm(NCFCA2) was developed with formal concept analysis technique and the parallel technology of multi-core CPU. Compared NCFCA2 algorithm with NCFCA algorithm, the time cost of NCFCA2 algorithm is significantly less than NCFCA algorithm due to the use of multi-core computing technology CPU.(2) The study of negative correlation patterns and theirs conservation(invariance) in three different datasets.①The NCFCA algorithm was applied to 800 cell cycle regulated genes from alpha 26, alpha 30 and alpha 38 time course gene expresion datasets of yeast cell cycle respectively. After the processing of the running results, it was found that the expression curves of genes encoded minichromosome maintenance protein(MCM genes) are negatively correlated with the expression curves of genes encoded histone( histone genes) and this negative correlation patterns were all found in alpha 26, alpha 30 and alpha 38 three datasets. The traditional view is that two groups of genes with negatively correlated expression trend have no functional similarity in general, however, these two groups of genes have significant functional similarity after gene set enrichment analysis with six MCM genes and eight core histone genes together. This finding suggests that two groups of genes involved in the same biological process may have negatively correlated expression trend. Then the NCFCA algorithm is applied to other ten time course gene expression datasets of yeast cell cycle, it was found that the expression curves of genes encoded minichromosome maintenance proteinare also negatively correlated with the expression curves of genes encoded histone in these ten gene expression datasets, and similar results were found in the recently published two tiling sequenced datasets. These finding suggests the negative correlation pattern between six MCM genes and eight core histone genes may be conservative(invariance). In the research about transcriptional mechanism of minichromosome maintenance protein complex genes and core histone genes, it was found that the conservatively negative correlation pattern between these two groups of genes may be caused by Clb-CDK1 kinase through a coregulation and a negative regulation. Clb-CDK1 may activate or inhibit these two groups of genes at the different stages of cell cycle.②The NCFCA2 algorithm is applied to GSE26169 and 2010.Shapira04 gene expression datasets about oxidative stress response selected from all the pathways. After the processing of the running results, it was found that the the expression curves of genes from starch and sucrose metabolism pathway are negatively correlated with the expression curves of genes from purine metabolism pathway. In other words, the genes from two different pathways may form a negative correlation pattern in gene expression data of environmental stress response. Then the NCFCA2 algorithm is applied to other ten time course gene expression datasets about environmental stress response, it was found that the expression curves of genes from starch and sucrose metabolism pathway are also negatively correlated with the expression curves of genes from purine metabolism pathway. These finding suggests the negative correlation pattern between genes from starch and sucrose metabolism pathway and genes from purine metabolism pathway may be conservative(invariance). In the research about transcriptional mechanism of genes from starch and sucrose metabolism pathway and genes from purine metabolism pathway, it was found that a conservatively negative correlation pattern between these two pathway genes may be caused by target of rapamycin complex 1(TORC1) through a coregulation and a negative regulation.③The NCFCA2 algorithm is applied to GSE26169 about oxidative stress response and Gasch2000 about heat shock response gene expression datasets selected from top 1000 gene expression data by the rank of variance of gene expression value. After the processing of the running results, it was found that the the expression curves of genes from the genes encoding ribosomal proteins and heat shock response genes, the genes encoding ribosomal proteins and oxidative stress genes are both negatively correlated. Then the NCFCA2 algorithm is applied to other ten time course gene expression datasets about environmental stress response, it was found that the expression curves of genes encoding ribosomal proteins are also negatively correlated with the expression curves of stress response genes in these gene expression dataswets. These finding suggests the negative correlation pattern between genes encoding ribosomal proteins and stress response genes may be conservative(invariance). In the research about transcriptional mechanism of genes encoding ribosomal proteins, heat shock response genes and oxidative stress genes, it was found that the negative correlation pattern between genes encoding ribosomal proteins and stress response genes such as heat shock response genes and oxidative stress genes may all be caused by target of rapamycin complex 1(TORC1) through a coregulation and a negative regulation.These results suggest that NCFCA algorithm can identify a lot of negative correlation patterns, especially size-balanced negative correlation patterns with a low time cost and space cost from gene expression data. After filtering redundancy and gene set enrichment analysis(GSEA), only a small number of negative correlation patterns are significantly biological significance and may be conservative(invariance). These findings indicate negative correlation patterns may be caused by a key regulator through a coregulation and a negative regulation. Biological systems may coordinate the activation and inhibition of gene expression within each subsystem. From the energy point of view, biological systems may maintain the balance between supply and demand of energy.
Keywords/Search Tags:negative correlation patterns, conservation, cell cycle, stress response, yeast
PDF Full Text Request
Related items