Gene regulatory networks(GRNs)are a manifestation to describe the regulatory relationships among genes,and the studies on the reconstruction of gene regulatory networks could contribute to reveal the biological mechanisms of cell life cycle and cell damage repair.In recent years,the development of gene sequencing technology has provided a solid data base for reconstructing GRNs,and the reconstruction of GRNs based on gene expression data has become a crucial issue in systems biology.However,accurately and efficiently inferring large-scale gene regulatory networks remains a challenging task due to the major problems of high dimensionality,sparsity and nonlinearity of large-scale gene regulatory networks.In this thesis,we focus on the problems of large-scale gene regulatory networks and propose a large-scale gene regulatory network inference algorithm,namely i LSGRN,based on the idea of dimensionality reduction and multi-model fusion,including a regulatory gene identification algorithm and a feature fusion algorithm.The method constructs nonlinear ordinary differential equation models of target genes and candidate genes based on time series and steady-state series gene expression data,and then reconstructs large-scale gene regulatory networks.The regulatory gene identification algorithm calculates the maximum mutual information coefficient between genes and sets a threshold to exclude redundant regulatory relationships for gene dimensionality reduction.The feature fusion algorithm utilizes XGBoost and RF models to train nonlinear ordinary differential equation models and obtain feature importance as the regulatory relationships among genes.This thesis tests the proposed algorithm with other mainstream methods on DREAM4 and the Escherichia.coli datasets,including method comparison experiments,cross-validation experiments,and ablation experiments.The effect of threshold selection on the overall score is also discussed in this thesis,indicating that appropriate setting of thresholds to exclude redundant genes can help improve the accuracy of inference results.The experimental results show that the regulatory gene identification algorithm can effectively exclude the influence of redundant genes on gene regulatory network identification,while the feature fusion algorithm can infer the gene regulatory network accurately.The algorithm outperforms the mainstream methods such as dyn GENIE3,Bi XGBoost and MMFGRN,and can infer the large-scale gene regulatory networks accurately.Meanwhile,the time complexity of each method was analyzed,and this method can maintain high accuracy and moderate computational complexity. |