Font Size: a A A

Research On Network Representation Learning Algorithm Combined With Nodes' Label And Text Information

Posted on:2021-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:M X ZhengFull Text:PDF
GTID:2428330626458921Subject:Computer technology
Abstract/Summary:PDF Full Text Request
From social networks to the World Wide Web,networks provide an intuitive and concise way to organize and store all kinds of real world information.Because networks contain thousands of nodes and edges,it is very difficult to perform complex reasoning process on the whole network.Network representation learning,also known as network embedding and graph embedding,aims to learn the low-dimensional dense vector representation of nodes in networks and use it as features of various tasks,such as classification,clustering,link prediction and visualization.With the development of science and technology,many nodes in the information society have rich external information,such as labels,text,videos and audios,which constitute complex information networks.Based on the complex information networks,it is widely existed.Traditional network representation learning algorithms mainly depend on the network topology information,but ignore these high-quality external information.Therefore,how to consider these external information into the process of network representation learning,improve the quality of network representation and enhance the effect of representation vector in the network analysis tasks is a promising research topic.This paper makes full use of the label information and text information of nodes,and combines them with the network topology information to further enhance the strength and effect of nodes' representation.The key points and innovations of this paper are as follows:(1)In this paper,CNLI(Combining Nodes' Label Information)algorithm is proposed to combine nodes' label information for network representation.Firstly,the initial vector representation of nodes is formed based on the topological structure of the network,and then the implicit sequence is formed by random walk among nodes of the same category.Then,the convolution neural network is introduced to optimize the nodes' vector by using the sequence and label information,so that the vector representation of nodes has label characteristics,so that the vector representation ofthe same kind of nodes is similar,and the vector representation gap of different kinds of nodes is widened.The experimental results show that the algorithm proposed in this paper improves the micro-F1 value of node classification and running time.(2)In this paper,a special objective function is defined to solve the problem that most of the current network representation learning algorithms are lack of targeted objective function.By characterizing the local and global probability distribution of nodes,which is specially used to obtain network information based on topology.(3)In this paper,CNTI-Edge(Combining Nodes' Text Information-Edge)algorithm is proposed,which uses nodes' text information for network representation.Firstly maps the text into text vectors to get the k nodes based on the closest text in the network of each node.These nodes based on text generation are added to the network as a supplementary topology to alleviate the problem of network sparsity.Using the objective function proposed in(2).The node vector representation based on real topology and the node vector representation based on text complementary topology are obtained respectively.Experimental results show that the algorithm proposed in this paper has a certain improvement in micro-F1 value of node classification and AUC of link prediction.(4)In this paper,CNTI-MF(Combining Nodes' Text Information-Matrix Factorization)algorithm is proposed,which is another method of text information utilization.Firstly,the text information of nodes is combined into the text-based feature matrix of nodes through the neural network model and the mutual attention mechanism.Then the whole network matrix is constructed,and the whole network is decomposed into several small-scale matrices by matrix factorization method.The text feature matrix is added into the matrix,and the text feature vector of nodes is obtained by iterative updating.The final vector representation of nodes is obtained by splicing with the topology based vector of nodes.Experimental results show that the algorithm proposed in this paper has a certain improvement in micro-F1 value of node classification and AUC of link prediction.
Keywords/Search Tags:Network representation learning, Node external information, Feature extraction, Objective function
PDF Full Text Request
Related items