| Deciphering the genetic basis of human disease is one of the important goals of biomedical research.Revealing the relationship between genetic diseases and pathogenics genes has become an important goal of human genetics for a long time.The prediction of pathogenic genes is crucial for the prevention,diagnosis and treatment of diseases.With the accumulation of protein-protein interaction data,we can use machine learning methods to find candidate pathogenic genes.It has been proved that the network representation learning algorithms perform well in clustering,node classification and link prediction,etc.In this work,we use network representation learning to predict pathogenic genes.We propose a novel heterogeneous network representation learning method called Multipath2 vec,which aims to precisely predict pathogenic genes of a target disease.In Multipath2 vec,we first construct a human gene-phenotype heterogeneous network.And we design the multi-path which can better capture correlations between different types of vertices to guide random walk in the human gene-phenotype heterogeneous network.Then we use network representation learning algorithm to learn features of the constructed network.Finally,we calculate the similarities between genes and the target phenotypes and then predict the pathogenic genes.In order to overcome the disadvantages of sparse context information for embedding learning,we proposed a new heterogeneous network representation learning algorithm called HDpath2 vec which is based on high-degree multi-path random walking in this paper.We first construct a human gene-phenotype heterogeneous network in HDath2 vec as well.Then HDath2 vec uses heterogeneous network representation learning based on high-degree multi-path random walking to learn the vector representation of nodes in the constructed network and capture richer structural contexts and semantics between distant nodes.Finally,we calculate the similarities and then predict the pathogenic genes.We implemented Multipath2 vec,HDath2vec and several baseline approaches on several data.Experimental results show that Multipath2 vec and HDath2 vec outperformed the state-of-the-art baselines in pathogenic genes prediction task. |