Network structure naturally expresses the connection between things,which is reflected in all aspects of daily life.How to reasonably represent the characteristic information of nodes in the network is an important research content of graph structure analysis.Network representation learning,as a kind of node feature learning method,is to transform the node features of the network into low dimensional dense real number vectors for subsequent link prediction and node classification tasks.However,the deficiency of the traditional network representation learning method is that it does not fully consider the global and local features of the nodes.Therefore,the SINE(Second Information Network Embedding)algorithm was proposed in this paper with full consideration of the topology characteristics of Network nodes,and a protein similarity prediction algorithm based on Network representation learning algorithm was designed.The specific research is as follows:(1)A network representation learning algorithm integrating the second-order proximity of nodes is proposed.Fully considering the characteristics of network topology,the calculation measure of node second-order proximity is designed,and a method of fusing the first-order and second-order proximity of nodes in the network,SINE,is proposed to enrich the feature vector representation of nodes in the network and mine more information hidden by nodes.SINE algorithm was applied to three real data sets to reconstruct the network structure,and the global and local feature vector representations of nodes in the network were obtained.By analyzing SINE algorithm and Node2 vec algorithm,and other Network representation learning methods integrating second-order similarity--Structural Deep Network Embedding(SDNE)and large-scale Information Network Embedding(LINE),we find that SINE algorithm performs better in clustering task.(2)A protein similarity prediction model based on network representation learning was proposed by integrating Gene Ontology and protein data.We proposed a network representation learning method integrating GO(Gene Ontology)data to calculate protein similarity model,so as to better explore the relationship between proteins.Traditional studies on similarity evaluation of GO terms are mainly based on the Information Content(IC)between GO terms,while structural Information of the GO term is ignored.Therefore,we consider the structural information of GO terms and apply different network representation learning methods to obtain the low-dimensional vector representation of proteins integrating network topology information in GO graph and Gene Ontology Annotation(GOA)graph respectively.Dynamic Time Warping algorithm(DTW)and cosine similarity were used to calculate protein similarity in GO and GOA graphs respectively,and then link prediction experiments were performed to evaluate the reliability of protein similarity networks constructed by different methods.Compared with the traditional IC methods,the protein similarity prediction based on network representation learning is more prominent.Among them,the network representation learning method based on random walk is particularly prominent in calculating the protein similarity.By adjusting the sampling method of random walk,it can explore more structural information.Therefore,fully considering the structural information of the graph can more effectively predict protein similarity. |