Font Size: a A A

Gene Regulatory Network Reconstruction For Dynamic Gene Expression Data By RNA-seq

Posted on:2017-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:L B JiangFull Text:PDF
GTID:1360330575491556Subject:Tree genetics and breeding
Abstract/Summary:PDF Full Text Request
RNA-seq technology is an important tool for research of functional genomics in post genome era,which quantifies the genome-wide gene expression map at a given moment in time,allowing for a better understanding of genome structure,gene expression patterns and gene regulatory networks.However,after obtaining complex RNA-seq data with high dimensions,how to carry out in-depth analysis to mining key functional genes or modules is still an issue that needs to be addressed urgently.Construction of gene regulatory networks is an important analysis method for RNA-seq data analysis,which can predict the interaction between genes,and thus determine the key regulatory hub and regulatory relationship.There are many methods of gene regulation network construction,but these methods still have many shortcomings.In this study,first of all,a novel clustering method based on Skellam distribution is proposed to reduce the complexity of gene expression data,considering the discreteness and high dimensionality of RNA-seq data and the spatiotemporal properties of gene regulatory networks.Second,in order to further quantify the interaction effect between genes and overcome the marginal effect in the existing network construction,a newly gene regulation network construction method is developed by combining game theory with high-dimensional system ordinary differential equation.When the living environment of organisms changes,the overall gene regulation will change to adapt to the new environment,resulting in the spatial level of the plasticity of gene expression.Considering the discrete characteristics of the RNA-seq data and the plastic expression of the gene,a finite element hybrid model was constructed based on the Skellam distribution and the EM algorithm for estimating the unknown parameters was derived in the hybrid model framework.AIC was used to determine the optimal number of clusters.On the basis of gene clustering,two kinds of hypotheses with biological significance were proposed to test whether there were differences in expression patterns of different genes within clusters and between different clusters.The different selection method of initial value of unknown parameters and different standardized method of gene expression data on the clustering performance of the new method and the clustering performance of the different clustering methods in plasticity expression data were studied by computer simulation.The simulation results show that the clustering performance of the method based on the model is the highest,and the normalization method has less effect on the clustering performance.Compared with the K-means and the SOM method,the clustering performance of the new method is the highest.The simulation results show that AIC criterion can select the true clustering number accurately.The estimated parameters of each cluster are close to the true parameters.The data of the dynamic transcriptome of roots of Populus euphratica were analyzed,and the new clustering method was used to analyze the differential gene.The applicability of the new clustering method was tested and the plastic expression module related to salt resistance of Populus euphratica was also excavated.The actual data analysis indicates that module 4 is an important plastic expression function module.Compared with the plasticity function module obtained by the actual data analysis and the result of GO classification of differential gene,the clustering performance of the new method is the highest.The hypothesis test showed that the expression of genes in the modules under two conditions showed significant difference,and the expression patterns among different clusters were significantly different.Gene regulation network is a complex dynamic high-dimensional system.Based on the evolutionary game theory,combined with high-dimensional ordinary differential equations to describe the complex linear and non-linear game relations between genes at the system level and quantify the interaction effect between genes,The game can be divided into six categories,namely "win-win","lose-lose","selfish","help others","Harm people not self-serving";"live in peace".Different parameter estimation method was integrated into the newly model.A hypothesis test with biological significance is proposed in the framework of maximum likelihood estimation or non-linear least squares to detect the interaction genes existing in the system.For the estimated parameters or interaction effect curves of genes,the relationship between the game to explain.First,the expression of plasticity of gene expression cluster analysis,reduce the complexity of the data;Second,the average expression smooth values of each cluster or gene expression smooth values of each gene were estimated by smooth function;Thirdly,we use group LASSO and adaptive group LASSO to complete the preliminary screening of the significant interaction genes.Fourthly,we construct the high-dimensional ODE for the initially screened genes,and then carry out the hypothesis test based on the nonlinear least squares framework and to estimate the interaction effect between genes to complete the construction of gene regulatory network.In the real data analysis,three important hub modules were found in the plasticity expression module network by the newly developed gene regulatory network construction method,in which the hub module 4 contains a large number of transcription factors.The transcriptional factors ERF061 and BHLH92 were found in the pivotal genes.These transcription factors may be related to the responses of Populus euphratica to salt stress.In the actual data analysis,the gene regulatory network constructed by the new method has better biological connotation compared with other methods,and the hub gene has a great correlation with the salt resistance of Populus euphratica.Computer simulation shows that the new method has better performance,compared with other methods,the higher the true positive rate,false positive rate is low,and can accurately estimate the interaction effect between genes.The newly developed plasticity expression clustering method can excavate the plastic expression pattern of genes,reduce the data dimension,and integrate the temporal and spatial patterns of gene expression,laying the foundation for gene regulatory network construction.On the basis of plasticity expression clustering,the interaction effect of genes was quantified at the system level based on the game theory combined with ODE.The new method is especially suitable for large-scale dynamic RNA-seq data;the open-source software were developed based on the new method can be freely downloaded from the website ccb.bjfu.edu.cn...
Keywords/Search Tags:RNA-seq, Skellam distribution, Cluster analysis, Game theory, Gene regulatory network, Populus euphratica
PDF Full Text Request
Related items