Font Size: a A A

Research Reachability Query Method For Large-Scale Graphs

Posted on:2018-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2348330512987362Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As a kind of relational data,graph data is always used to describe some particular relationships between objects,it has been widely used in geographic network,biological information,management of tourism resources and XML databases,etc.With the development of computer technology,the size of data increase quickly,and so are the complexity and value.Reachability query play a fundamental role in graph analysis,which has been studied by many scholars for a long time.Except traditional methods such as transitive closure and traversal algorithm,existing methods focus on index at all time.However these existing methods cannot satisfy query on large-scale graphs for low query efficiency and irrational index.And the increasing graph lead to a single machine can no longer meet the current demand.To solve these problems,this thesis proposes a reachability query method on large-scale graphs named folding tree coding index method.It consists of offline pretreatment and online query.It adopts distributed technology and Hadoop computing platform,and based on the principle of high cohesion and loose coupling,we partition the graphs to some subgraphs for reducing communication,in result,nodes in same subgraph stay in close touch and nodes between subgraphs communicate few.In the offline pretreatment,this thesis presents a folding tree coding index method(FTCI),this method first remove two kinds of cycle-type structure,one is strong connected component,the other one is cycle-type structure that has topology relations,and it proposes a result marker mechanism based on B+ tree to retain partition results on subgraphs.Then it has transformed subgraphs to forest-type.Finally,FTCI construct folding trees and create similar Huffman code to save the reachability information within each subgraph and among subgraphs.In the online query,this thesis proposes a reachability query method based on FTCI named Folding Tree Coding Index Method Distributed Reachability Query(FTCI-DRQ),according to the subordination between two query nodes,FTCI-DRQ provides the strategy within subgraphs and among subgraphs based on the index and the B+ tree result marker mechanism.Finally,by experiment on two kinds of real datasets to analyze and compare some existing methods on index creation time,index storage cost and query time,experimental results show that FTCI-DRQ has improved the query efficiency and reduce the storage cost,and it has good performance on queries on large-scale graphs.
Keywords/Search Tags:distributed system, large-scale graphs, reachability query, similar Huffman code
PDF Full Text Request
Related items