| Objective: Since the completion of the human genome project,studying the mechanism of human diseases and solving its genetic basis has become one of the core topics of biomedical research.With the rapid development of bioinformatics and computer technology,in order to reduce the problems of long time and high cost of traditional biological experimental research.Based on a large number of biomedical data,researchers use computer technology to mine potential information conducive to disease-related research,which has proved to be a very effective method.At present,most studies mainly focus on the prediction of disease pathogenic genes,and there are relatively few studies on the prediction of symptom related genes.As the most direct and visible representation of disease,the study of the relationship between symptoms and genes is of great significance to medical theoretical research and clinical practice.Disease gene association prediction models based on heterogeneous networks are one of the research directions in this field.However,these models have some problems in the prediction of symptom related genes,such as insufficient utilization of node characteristics,poor processing effect of redundant information and so on.Therefore,designing a prediction model of symptom gene association relationship with excellent performance is conducive to promote the research on disease symptoms and their related biomedical information.Methods: In this study,an improved multi relation graph convolution model based on heterogeneous network representation learning(HMLP_GCN)is proposed for the prediction and analysis of symptom gene association.First,sort out the biomedical data required for experimental analysis from the public databases such as HPO,Dis Ge NET,Orphanet and the public data sets of existing research.In this paper,four data sets,including symptom gene association data,disease gene association data,symptom disease association data and protein interaction data,were sorted out.Using these four data sets,four heterogeneous graph networks(G1,G2,G3,G4)are constructed respectively.The number and type of nodes in the graph will affect the final prediction performance of the model.In this paper,four heterogeneous graph networks are tested based on R-GCN model.Considering that the R-GCN model does not make full use of the characteristics of nodes in the graph,this paper introduces a heterogeneous multilayer perceptron to deal with the problem of mapping nodes of the same type.The multi-layer perceptron is used to splice the dimensions before the graph convolution,so that the characteristics of nodes can be better processed,and the nodes of the same type in the graph have the clustering characteristics.Then the R-GCN model is used to perform convolution operation to cluster different types of nodes.Finally,the HMLP_GCN and three benchmark models are predicted on the complete figure G4 and three variant heterograph networks to evaluate the performance of the modelResults: The experimental results of G1,G2,G3 and G4 heterogeneous graph networks using the existing model show that the prediction performance of the model is improved when the number and type of nodes in the graph increase.However,as the number and type of nodes increase,the redundant information in the graph also increases,and more noise data will lead to the decline of the prediction performance of the model.The prediction results of HMLP_GCN and three benchmark models on the complete graph G 4 show that,the optimal scores of HMLP_GCN were 0.958,0.910,0.900,0.936,0.917 on AUC,accuracy,accuracy,recall and F1-score,respectively..Compared with the three benchmark models,the AUC value is improved by 4.8%,4.2% and 5.8% respectively,and the accuracy is improved by 3.8%,5.8% and 1.3% respectively.In addition,the prediction results of HMLP_GCN model on G1,G2,G3 and G4 heterogeneous graph networks also achieve the best score.The experimental analysis shows that HMLP_GCN can deal with the problem of information redundancy caused by the increase of the number and type of nodes.Conclusion: This paper constructs a prediction model of symptom gene association based on heterogeneous graph neural network.The experimental results show that the method in this paper has certain advantages in node characteristics and redundant information processing compared with the existing methods in the prediction of symptom gene association relationship,which provides an idea for symptom related biomedical research. |