Font Size: a A A

Research On Multi-source Heterogeneous Data Graph Fusion And Link Prediction Method Based On Graph Multiplication

Posted on:2021-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:X W LangFull Text:PDF
GTID:2518306107962119Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data mining technology plays an important role in pushing enterprise information and improving decision efficiency.In practical applications,it often needs to mine the connection relations between entities from different sources.There are great differences in the storage structure of the data corresponding to different entities,which is called heterogeneous data.Multiple-source heterogeneous data network information includes attribute information of data nodes,topological structure information within a single type network and heterogeneous network topology information.The currently proposed prediction methods usually only use part of the information,and are limited to the bipartite network.In order to integrate these three types of information into the prediction process,a multiple-source heterogeneous data link prediction framework based on graph multiplication is proposed.The multiple-source heterogeneous data prediction link framework based on graph multiplication uses the node attribute information of the network to generate the similarity between nodes,and uses the topology information inside the single-class network to extend the similarity between nodes to a higher-order similarity.Then,the weighted k-part graph is fused into a weighted fusion single-part graph using graph multiplication,and the problem of link prediction between multiple-source heterogeneous data is converted into a classification problem in the single-part graph.The links between the fusion nodes reflect the topology information of the heterogeneous network before fusion,which can be used to generate feature vectors of the fusion nodes and make predictions.In addition,the framework is theoretically applicable to link prediction problems for third-order and higher-order networks.For the data of the PU problem,a label propagation algorithm based on isolated forests is proposed,which can quickly classify the unknown links in the PU problem.In order to evaluate the prediction framework,selecting the Drug-Target data set to test the effect on the PU problem,and selecting the cora data set to test the effect on the PNU problem.Experimental results show that,compared with traditional prediction methods,the prediction framework has good prediction effects on PU and PNU problemdata sets.Experimental results of label propagation algorithm based on isolated forests show that the algorithm has fast training and prediction speed on the PU problem data set,and can ensure a high recall rate.
Keywords/Search Tags:Multiple-source heterogeneous data, Link prediction, Graph multiplication, Graph fusion
PDF Full Text Request
Related items