Font Size: a A A

The Co-expression Analysis And Other Extension Applies In Transcriptomic Data

Posted on:2017-01-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:1220330482994779Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development progresses of genomic technology, the biology big data wave has rushed into our daily research work. Especial the huge amount of the genomic sequencing data and transcriptomic data has not only solved exist biology problems but also brought us more questions. One of the important question is how can we efficiently process the bulk of data to acquire the conclusion that we want to know. And the other thing is how can we dig deeper in understanding the transcriptomic data to make the functional relationship between and the gene function and pathway construction more clear. Base on the concerns above, the author has mainly focused on the biofuel research area during the Ph.D period. So all the projects here aimed to increase the biomass out-put via developing novel algorithm, software and web-based platforms.The most common measure of the analysis is to apply the co-expression method in transcriptomic data. The first project in this paper is to predict the reliable Cell Wall related genes in Switchgrass. Base on the large numbers of plant cell-wall related genes which have been identified or predicted in several model plant genomes such as Arabidopsis thaliana, Oryza sativa(rice) and Zea mays(maize), we presented a computational study for prediction of CW genes in switchgrass using a two steps procedure:(i) homology mapping of all annotated CW genes in the model species to switchgrass, giving rise to a total of 991 genes; and(ii) candidate prediction of CW genes based on switchgrass genes co-expressed with the 991 genes under a large number of experimental conditions; specifically, our co-expression analyses using the 991 genes as seeds led to the identification of 104 large clusters of coexpressed genes, each referred to as a co-expression module(CEM), covering 830 of the 991 genes plus 823 additional genes that are strongly co-expressed with some of the 104 CEMs. These 1,653 genes represent our prediction of CW genes in switchgrass, 112 of which are homologous to predicted CW genes in Arabidopsis. Functional inference of these genes is conducted to derive the possible functional relations among these predicted CW genes. Overall, these data may offer a highly reliable information source for cell-wall biologists of switchgrass as well as plants in general.After this project, we have realized the advantage of bi-clustering method in plant transcriptomic analysis area. So we extended the usage of the bi-clustering method to develop a local co-expression function value which is called “BF score” to replace the general coexpression value like Pearson or Spearman correlation value. Due to the limited repeat number and high condition number features in plant transcript samples, most of the genes could only show co-expression relationship with other praters under some certain conditions rather than the global vector. So the new local co-expression evaluation function has a higher sensitivity compared with the general methods. With this new function, we tried to figure out the differences and commonalities in lignin biosynthesis progress in Arabidopsis, maize and switchgrass. Here we predicted 219,177 and 532 novel lignin relation genes, which have strong co-expression connective level with the lignin biosynthesis gene sets. And this BF score have been applied to pathways level co-expression analysis to predict close related pathways to lignin biosynthesis function.The other usage of the newly defined BF score is developing a suit of software called Gene QC, which can evaluate the quality of the RNA-seq data in multiploid species’ genomes and re-assign the multiple alignment short reads base on the training sets from other transcription data which trained with the BF score function. The software could not only offer the useful information to the botanists for reliable experiment gene candidates’ selection but also give a new insight of how to improve the research value of RNA-seq data in plants.The last project relate to transcriptomic data is that we built a web-platform to use RNAseq data and SVM method to predict transcript units in prokaryote. To facilitate the usage of the perdition work, we built a platform called Seq TU. This is an user-friendly platform which only needs several common information from the users and can automatically finish data downloading, RNA-seq short reads mapping and TU prediction work at the back end within a reasonable time period.At the last, we descript other two projects, which are not related to transcriptomic data. One is we used 52 closely related E. coli genomes to study the problem of the operons’ global arrangements rule from the perspective of the energy efficiency drive affect in bacterial genome evolution, and the other one is the interactive web-based platform CINPER for gene network reconstruction.
Keywords/Search Tags:Bi-clustering, transcriptomic data, bioenergy, plant, prokaryote, bioinformatics
PDF Full Text Request
Related items