Font Size: a A A

The Research Of Relational Learning In Heterogeneous Information Networks

Posted on:2018-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:Q GuFull Text:PDF
GTID:2310330518495399Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years, the boom of heterogeneous information networks(HIN), especially the emergence and development of knowledge graphs,has accelerated the research of related techniques in heterogeneous information networks. A number of data mining tasks have been explored in these networks. Among them, link prediction which aims at predicting the links between entities is one of the important tasks, it is also the foundation of solving many other issues in HIN. Relational inference refers to inferring the latent relations in the network by analyzing the complex network structure and the various semantic meanings of the heterogeneous information network, and it is a guideline to solve link prediction tasks.In this paper, we first study the basic similarity measure served for the relational inference. This paper proposes a Monte Carlo simulation based random path sampling algorithm, RSSim, to solve the problem of time efficiency and memory consumption in traditional matrix chain multiplication based methods like PCRW and HeteSim. The paper also gives the theoretical proof of the size of random walkers. Experiments also prove that only a small number of walkers are enough to guarantee the accuracy of the similarity ranking, and the empirical formula of the similarity error is given.The mainstream method based on path features is Path Ranking Algorithm (PRA). It uses a two-step algorithm to complete the link prediction task. The first step is to take a traversal algorithm on the graph to find all the meta paths as features. The second step is to train a relational classification model by a meta-path-constrained random walk algorithm. In this paper, based on the RSSim, a novel relational inference method -- subgraph path extraction algorithm is proposed. It integrates the feature selection and the feature calculation processes of PRA algorithm, building features by searching and merging subgraphs of entities, which greatly saves the time cost in the process.In order to meet the requirement of relational inference under large-scale knowledge graph, this paper presents a distributed computing version of subgraph path extraction algorithm. It consists of two steps:distributed subgraph path feature computation and distributed multi-model training. The parallel algorithm solves the low efficiency problem of training models on a single machine. In the distributed system,the multiple models divided according to the relations will train simultaneously, which greatly improves the efficiency.
Keywords/Search Tags:heterogeneous information networks, similarity measure, relational learning, link prediction, random walk
PDF Full Text Request
Related items