Font Size: a A A

Research On Network Representation Learning Methods For Literature Data

Posted on:2020-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y YinFull Text:PDF
GTID:2370330620453196Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The acquisition of literature information is an important part of carrying out the scientific research.How to extract useful information from massive literature data is a common problem that researchers need to solve.The literature data can be modeled in the form of networks for analysis,but the scale and complexity of literature networks increase the processing difficulty of computer.How to extract literature networks into reasonable forms and efficiently apply them to application tasks,e.g.,author classification,article similarity search and co-author relationship prediction,is of great significance.In order to overcome the high computational complexity of traditional network representation methods and the difficulty in effectively integrating heterogeneous information,researchers have proposed the Network Representation Learning,also known as Network Embedding.Network Representation Learning aims to represent nodes in the original networks as low-dimensional dense vectors which can be applied as input to the machine learning models to process downstream application tasks.With the rapid development of science and technology,the literature data has grown exponentially.The existing network representation learning methods mainly face the following challenges when dealing with the literature data: 1)The literature data contains different types of entities such as authors,papers,conferences,etc.,and the complex interactions between entities contain rich semantic information.But the existing heterogeneous network representation learning methods are difficult to effectively combine user orientation to preserve semantic information when processing the literature information networks composed of literature data.2)The literature data changes constantly with time,and the literature information networks formed by literature data present obvious dynamic characteristics.However,the existing dynamic homogeneous network representation learning methods are difficult to effectively capture the network evolution characteristics as well as the structure information of historical networks.3)The semantic information in the literature data changes with time,and the current semantic information is closely related to the historical semantic information.However,the existing network representation learning methods are difficult to effectively preserve the historical semantic information of literature networks.Therefore,to solve the above problems,this thesis abstracts the literature data into different network forms,and focuses on the research of network representation learning methods for literature data.The main contributions of this thesis are as follows:1.In order to solve the problem that the existing network representation learning methods are difficult to effectively capture multiple semantic information in the literature information networks,this thesis models the literature data as heterogeneous information networks,and proposes a heterogeneous network representation learning method(Subgraph2vec)based on homogeneous subgraph transformation,which improves the effect of node classification and similarity search results.This method combines the semantic information contained in different meta-paths to construct weighted edges which can characterize node relationships among the same type nodes,thus the weighted homogeneous subgraphs can be constructed.Then,this method adopts biased random walks on the weighted homogeneous subgraphs to get the node sequences.And the node sequences are input into the Skip-gram model as the node "context" to learn the node embeddings.The experimental results on multiple real-world datasets show that the proposed method can selectively learn the node embeddings according to the user's needs.Also,the learned node embeddings perform better than the baseline algorithms in downstream tasks such as node classification and similarity search.2.In order to solve the problem that the existing network representation learning methods are difficult to effectively capture the dynamic evolution characteristics and historical structure information of the literature networks,this thesis models the literature data as dynamic homogeneous networks,and proposes a dynamic network representation learning method(MHDNE)based on the Hawkes process,which improves the effect of node classification and link prediction results.This method abstracts the generation process of new edges into time sequences,and integrates the historical edge information as well as the network evolution properties into the generation process of new edges based on Hawkes process.By integrating the multivariate Hawkes process into network embedding,MHDNE resolves the issue that the existing network representation learning methods cannot effectively capture both the historical information and evolution characteristics of dynamic networks.The experimental results on multiple real-world datasets show that the proposed MHDNE algorithm can effectively integrate the dynamic evolution information as well as historical structure information of literature networks into node embeddings,so that the node embeddings learned by the MHDNE algorithm performs better than the comparison algorithms in downstream tasks such as node classification and visualization.3.In order to solve the problem that the existing network representation learning methods are difficult to effectively preserve the historical semantic information of the literature information networks,this thesis models the literature data as dynamic heterogeneous networks,and proposes a dynamic heterogeneous network representation learning method(DHNE)based on the network augmented graphs and the modified Skip-gram model,which improves the effect of node classification and node trajectory classification results.In this method,the dynamic heterogeneous network is regarded as network snapshots at different times,and the network augmented graphs containing multiple network snapshots within the time step are constructed to fuse the current and historical information of the dynamic network.Then,under the guidance of meta-paths,this method adopts biased random walks on the constructed augmented graphs to obtain node sequences which contain semantic information and structural information of the dynamic network.Finally,a modified Skip-gram model is proposed to learn the node embeddings.The experimental results on multiple real-world datasets show that the DHNE algorithm can effectively preserve the historical semantic information of the literature networks.Compared with the baseline algorithms,the node embeddings learned by the proposed method perform better in downstream tasks such as node classification,node trajectory classification and visualization.
Keywords/Search Tags:Network representation learning, literature information network, heterogeneous information network, dynamic network, Hawkes process
PDF Full Text Request
Related items