| Genome study shows that a small part of the genome is indispensable for the survival and reproduction of organisms.These genes are called essential genes.The essential genes are crucial to the breathe and growth of the organism.If the essential genes of the organism are knocked out,it will lead to death or infertility.The identification of essential genes not only understands the minimum requirements for human survival and reproduction but also helps search for human disease genes and new drug targets.At present,there are mainly two methods for the study of essential genes,namely experimental methods and calculation methods.The experimental method predicts that the essential genes are effective but expensive and time-consuming.Developing highly efficient calculation methods to predict the essential genes is a necessary and effective supplement to the experimental methods.Traditional calculation methods often use a single characteristic index to predict human essential genes,and the accuracy of prediction is generally not high.This paper puts forward the idea of integrating the heterogeneous network topology data,based on making full use of the existing multiple predictive essential gene networks,and fuses multiple heterogeneous networks into a new network by restarting the random walk algorithm.It takes full account of the various associations of genes in different networks.The experimental results show that the integration of heterogeneous network topology data method can more accurately predict the human essential genes than the single network-based prediction model.This article mainly introduces the prediction of human essential genes from two stages.The first stage is the data collection and processing stage.At this stage,we obtain six kinds of heterogeneous prediction networks with human essential genes and topological structure of genes,from the DEG database and the STRIGN database.Secondly,the topological data of six heterogeneous networks were effectively merged using the restart random walk algorithm.Integrate six networks into a unified genetic forecasting network to obtain a low-dimensional feature matrix that retains the main topology attribute information of each heterogeneous network.Finally,using this matrix as a training sample,SMOTE oversampling algorithm is used to solve the process of training SVM.In the imbalance problem between positive and negative samples,we trained the human essential gene prediction model based on the support vector machine method.The second stage is the analysis and evaluation of experimental results.In this phase,the training results are firstly analyzed,and based on the experimental results,the parameters are continuously optimized to find the optimal prediction model.Secondly,the accuracy and ROC curve are used to evaluate the performance of integrating heterogeneous network topology data prediction methods and various single network prediction methods.The experimental results show that the method of integrating heterogeneous network topology data predicts the performance of human essential genes is better than a variety of single networks.Finally,based on the same heterogeneous network topology data,the essential gene prediction effects of the circulatory neural network model and the random forest model are compared.The experimental results show that the support vector machine model has better prediction performance for human essential genes. |