Inferring gene regulatory network from gene expression data helps to understand biological process in depth,promote drug design and drug target discovery.With the generation of massive omics data,how to accurately infer gene regulatory networks from gene expression data has become an important problem in the field of bioinformatics.At present,many calculation-based regulatory network inference methods have been proposed.However,due to the “high dimensionality,small sample”characteristics of gene expression data,most methods face challenges in inferring network structure.In order to cope with this challenge,the methods of inferring gene regulatory network based on feature selection has become a hot research topic.Aiming at the problems of existing feature selection methods,this paper designs a reasonable and effective control network inference algorithm to improve the accuracy of predicting the network structure.The specific work includes the following aspects:(1)Aiming at the problem that the direct and indirect regulatory genes of the target gene cannot be effectively identified in the existing gene regulatory network inference methods based on feature selection,which leads to the low accuracy of network construction,a feature selection strategy gene regulation network inference based on improved Markov blanket discovery algorithm is proposed IMBDANET.This method firstly provides an improved Markov blanket discovery algorithm IMBDANET based on mutual information,conditional mutual inclusive information(CMI2)and the data processing inequality(DPI).The algorithm is applied to identify direct and indirect regulated genes for each target gene;Then a complete gene regulatory network was integrated according to the obtained direct regulation genes of each target gene;Finally,an important degree score rule(IDS)was applied to optimize the network structure by processing isolated genes in the network.The proposed method is validated on six datasets types and different network sizes,and the experimental results show that the proposed method can effectively remove the redundant regulation relations and improve the accuracy of network inference.(2)In view of the problem that a single feature selection method cannot fully and accurately describe the complex regulatory relationship between genes,a prediction method of gene regulatory relationships based on integration of multiple feature selection methods,MFSINET.Firstly,the network inference problem between N genes was decomposed into N regression problems.Bootstrapping random subsampling was performed on the gene expression data for each regression problem.Then,based on the subsamples obtained by random sampling,the score of regulatory relationship between genes were predicted using multiple feature selection methods.Finally,the global ranking of all possible regulatory edges in the network is obtained by summarizing N individual regression problems.The methods were validated on datasets of different network sizes show that this method can effectively predict the complex regulatory relationship between genes. |