| In the post-genomic era,modeling of biomolecular networks has become a powerful tool for exploring complex life activities.Constructing and analyzing of gene co-expression networks is an effective method to predict the function of unknown genes in target species.Functionally related genes are usually coordinated expressed at the transcription level,so gene co-expression networks can intuitively show these interconnected genes with similar regulatory mechanisms.By using some genes with known functions to perform the guilt-by-association approach in the co-expression network,we can predict the related functions of unknown genes accurately and efficiently.The construction of gene co-expression networks requires a lot of high-quality transcriptome sequencing.The transformation of next-generation sequencing technology has led to the accumulation of huge amounts of sequencing data for different species on public database platforms,which has provided the possibility for the construction of gene co-expression networks.Tea has also accumulated a large amount of RNA-seq data in recent years,and its genome has also been assembled in recent years.As an important crop of tea source,there are a lot of unknown gene functions need to be studied in tea plant,which hinder the application of tea breeding and metabolic engineering.While the efficiency of tea gene function analysis using traditional experimental methods is low,it is urgent to develop a new omics method to mine a large number of candidate genes quickly and accurately.We select tea as the research object in this paper.Based on a large sample of RNA-seq data,a high-quality gene co-expression network is constructed to accelerate the study of unknown gene function of tea.And the following works were mainly completed in this paper:(1)Collection and analysis of tea transcriptome data.We searched and screened 288qualified tea transcriptome sequencing data samples from the public database.These samples were analyzed by a optimized pipeline with the combination of tea genome annotation information.Finally,the expression profile data of 33,932 genes in tea,which expressed on 261 samples,were obtained.(2)Statistical Modeling of tea gene co-expression networks.We removed 109 genes that have no expression on all samples.The Pearson correlation coefficients and corresponding p-values which was obtained by random perturbation between the remaining 33,823 tea genes were calculated based on the standardized expression profile data.And gene pairs which has a p-value smaller than 0.01 were retained as statistically significant gene pairs.At the same time,with the different correlation coefficient thresholds,the statistical regression fitting index R2 of degrees and degree distribution and the average connectivity of the network are calculated to ensure the scale-free and small-world properties of the entire network.We select a correlation coefficient with big R~2 and average connectivity,and finally we get 0.7 as the suitable network threshold.After screening by this threshold,a tea gene co-expression network with 27,158 tea genes as nodes and 2,574,341 gene pairs as edges was constructed.(3)Gene function prediction in tea gene co-expression network based on the algorithm of random walk.In order to dig out the related functions of genes from tea gene co-expression networks more efficiently,an algorithm called random walk with restart is introduced to combine with the traditional method which is based on correlation analysis for the gene function prediction.It is finally found that the accuracy of the prediction results of this method is greatly improved through the comparison of the application of the selected data.(4)Implementation of a tea gene co-expression network database.With the obtained co-expression network data and relevant annotation information of the tea genome,we established a tea gene co-expression network database platform Tea Co N(a platform of gene co-expression network for tea plant,http://teacon.wchoda.com).Tea researchers can easily browse,retrieve,and download gene co-expression network data of tea by using this platform.The website also integrates related visual tools for gene function prediction such as BLAST,GO and KEGG enrichment analysis,genome browsing(JBrowse),expression profile visualization,etc.This paper conducted a series of analysis operations through the dehydrin gene Cs DHN2 by using these functions,which confirmed the utility of the platform. |