Font Size: a A A

Research On Biological Network Construction Algorithm Based On Multi-gene Interaction Information

Posted on:2019-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L XingFull Text:PDF
GTID:1360330590472808Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The growth and development of organisms,morphology and response to internal and external environment are controlled by intrinsic complex biological networks.At the gene level,gene regulatory networks(GRNs)depict this complex mechanism of regulatory elements at the level of genes.Research on the reconstruction of GRNs plays an important role in the analysis of genetic structure and regulatory mechanism of complex traits,and is a very challenging frontier subject in system biology and bioinformatics.In recent years,in the face of growing demand for food,it has been an important research topic to develop methods for locating the key genes and for constructing GRNs of important agronomic traits related to yield in the field of bioinformatics.Therefore,this dissertation,taking the model organism Arabidopsis and Rice as the research object,making GRNs reconstruction algorithm as the main line,mainly focuses on the the calculation problems of GRNs,including candidate gene identification,reconstruction algorithms and data fusion problems.The main contributions of this dissertation mainly include the following four aspects:Firstly,a method for identification and ranking of differentially expressed genes in single time-series expression data is proposed.Due to the high cost of biological data acquisition and screening,identification and ranking of candidate genes can promote the research on the reconstruction of GRNs.Therefore,a differentially expressed gene identification algorithm based on flat gene filter and spline curve fitting is proposed,and then a new gene prioritization strategy is proposed for candidate gene ranking.Firstly,we design a flat gene filter based on the Ljung-box test,which utilizes the temporal properties of data,to filter out flat genes.Then,we put forward a B-spline model detector to identify differentially expressed genes in statistical sense.Finally,we propose a new gene prioritization strategy based on the principle of partner evaluation,which uses the coexpression information as the partner evaluation index to rerank the differentially expressed genes.The new rank can reflect the biological importance of genes in specific processes or conditions.Experimental results show that the proposed algorithm in this paper can effectively identify the differentially expressed genes in single time-series expression data and rank them according to biological significance,and then help to find key genes and to construct the regulation network in the following research.Secondly,a candidate gene automatic selection algorithm and an improved version using data processing inequality named flooding-pruning hill-climbing to construct the GRNs based on Bayesian network model are proposed.With the rapid development of biotechnology,a large number of transcriptome data has been accumulated.The reconstruction of GRNs using these data has become a hotspot in the field of bioinformatics.The reconstruction of GRNs based on Bayesian network model is concerned with its intrinsic probabilistic characteristics,but current methods still faces the problem of scarcity of data and complex search space,which cannot efficiently and accurately learn the network structure.Therefore,this paper proposes a candidate gene automatic selection algorithm(CAS).The CAS algorithm uses mutual information to measure the correlation between nodes,and then automatically identify the associated nodes by using the idea of breakpoint detection to reduce the search space.Considering that the correlated nodes are not all neighbor nodes,the concept of DPILevel is put forward according to the principle of data processing inequality.The algorithm sorts all the related nodes using DPILevel to distinguish the neighbor nodes from the indirectly related nodes.This will make further efforts to reduce the search space and false positive rate.Based on the concept of DPILevel,we propose an improved flood pruning algorithm(FPHC)to speed up the learning of the network structure.Experimental results prove the validity of the proposed algorithms.The proposed algorithm can effectively reconstruct the GRNs,which can provide bioinformatics basis for the identification of key genes and the analysis of genetic components.Thirdly,a hierarchical clustering guided graphical Granger causality algorithm for the construction of large-scale GRNs is proposed.It is of great significance to construct a large-scale GRNs by calculating the causal relationship between genes for understanding the structure characteristics of the biological network and the identification of the candidate key nodes.Reconstruction of GRNs based on Granger causality has been paid much attention in the aspect of discovering causality between genes.The existing method based on Granger causality has higher false positive rate,while the improved graphical Granger method cannot effectively deal with the problem of correlated feature,and grouping genes based on biological priori knowledge is limited due to scarcity of biological prior knowledge.To mitigate these problems,this paper proposes an improved graphical Granger causality method based on hierarchical clustering,which can be used on larger scale data.First,genes are grouped using hierarchical clustering method with Pearson correlation coefficient.Then the regulators are identified by the divide and conquer strategy.Finally,results are merged to construct the final GRN.Compared with concerned methods,the method proposed in this paper does not require biological prior knowledge,obtains more accurate calculation results,and can provide basis for analyzing network structure and further refining and controlling relation analysis.Finally,in the field of rice data fusion,the construction method of tissue specific protein interaction network for species with less biological annotation is studied.Limited by the lack of annotation data,the large amount of rice related omics data cannot be well integrated to form available prior knowledge.A reasonable bioinformatics method is urgently needed to integrate these multisource omics data,which can provides prior knowledge for the construction of GRNs.Tissue-specific gene expression and protein interaction are important for the study of gene regulation,protein function and cell process.Therefore,in this paper,a computing framework to predict tissue-specific protein interaction networks for the species with less annotation data by combining multiple omics data is proposed.Firstly,a unified evaluation standard and data integration method are established to identify tissue-specific genes.Then,a new Interolog mapping method is proposed to construct the target species protein interaction network.Finally,the protein interaction subnets of different tissues are constructed,and highly reliable protein interactions are extracted.Based on the above framework,we construct the first integrated network for tissue-specific protein interactions of rice.Then,we analyse this network in detail to validate the effectiveness of the framework.The proposed framework and the rice tissue-specific protein interaction network,as predicted prior knowledge,are helpful for the key gene discovery and multi-gene regulatory network construction of rice high-yield traits.
Keywords/Search Tags:gene regulatory networks, differential expressed genes, Bayesian networks, graphical Granger causality, candidate auto-selection, tissue-specific protein interaction
PDF Full Text Request
Related items