Font Size: a A A

Research On The Efficient Subgraph Query In Heterogeneous Networks

Posted on:2018-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:C J ZhengFull Text:PDF
GTID:2348330542968910Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a complex data structure,graph structure can be used to describe the relationship between any two data elements.Therefore,various systems can be modeled using graph structure.Subgraph query is an important method to study complex network.Present experiments shows subgraph query is available to solve problems related to graph data.According to the query accuracy,the subgraph query can be divided into accurate subgraph query and approximate subgraph query.Nowadays,a large number of research results have been achieved for subgraph query in homogeneous networks.However,recent studies have demonstrated the power of modeling real world data as heterogeneous information networks(HINs)consisting of multiple types of entities and relations.Compared with the homogeneous network,heterogeneous network has more information,and it is a realistic way of data modeling.Therefore,in this thesis,we study the subgraph query in heterogeneous networks.The main research contents are as follows:(1)In order to solve the problem of exact subgraph query,the concept of node distance distribution is proposed firstly,and then the node distance distribution-based subgraph query algorithm(NDDSQ)is proposed.The experimental results show that the proposed algorithm can reduce the computation time considerably and possess the correctness.(2)Based on Strong Simulation algorithm in homogeneous networks,combining with the concept of meta-path in heterogeneous networks,this thesis proposes an approximate subgraph query algorithm in heterogeneous networks.The experimental results show that the algorithm not only meets the needs of the approximate subgraph query,but also improves the efficiency of the query.(3)To deal with Top-k approximate subgraph query problem in heterogeneous networks,based on meta path-based subgraph query algorithm(MPSQ),this thesis proposes an Top-k approximate subgraph query algorithm.This algorithm makes the semantic distance of node set as Top-k ranking metric.And the algorithm mainly uses greedy strategy and score limit strategy,to realize Top-k subgraph query efficiently in heterogeneous networks.(4)The distributed parallelization algorithms are designed and implemented in Spark environment including node distance distribution-based subgraph query algorithm,meta path-based subgraph query algorithm and Top-k approximate subgraph query algorithm.Finally,the experimental results show the correctness and scalability of these parallel algorithms.
Keywords/Search Tags:heterogeneous networks, subgraph query, Top-k query, parallel processing
PDF Full Text Request
Related items