Font Size: a A A

Distributed Query System For Large Scale Knowledge Graph

Posted on:2021-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:H J ZhouFull Text:PDF
GTID:2518306107468714Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Knowledge graph is a network that describes real world entities and concepts as well as their relationships.Using knowledge graph can better query complex association information,understand users' intentions from the semantic level,and more accurately find out the information users need.Main search engines begin to study knowledge graph and plan to use it as the underlying data structure of next generation search engines,most of the existing work focuses on the data structure design and processing optimization of distributed stored procedures,and has great effect,but the research on load balancing and join of graph still needs a major breakthrough.A distributed query system for large scale knowledge graph is designed and implemented.Firstly,the abstract data class and the real data class are designed to realize the parallel of the joined strategy selection and the single query of the data node.The abstract data class for management nodes does not store real data,but only stores the data acquisition methods of data nodes to reduce the flow of data.The real data class for data nodes stores the intermediate results in memory to reduce I/O time consumption.Abstract query class corresponds to real data class one by one.Abstract query class controls data granularity of data nodes by merging,division and join,so as to realize parallel connection between execution nodes.When the execution of the data node fails,the data node can recover the data according to the blood relationship of the abstract query class.Secondly,the tree of join is introduced to realize the distributed join algorithm based on data distribution.Join tree is an order set of abstract data classes which get the final result by merging,division and join.The execution node manages the data flow and gets the final result according to join tree.Manage the statistical index of the data stored in the management node to predict the number of query results.Based on the prediction of result's number,an join tree is created with the idea of dynamic planning to reduce the execution times of join and improve the efficiency of the overall join.The management node interval statistics the load of the cluster at a certain time.based on it,the join tree is decomposed into an join forest with the dynamic planning idea.Each execution node selects the optimal join tree according to its own load situation to achieve the load balance of the cluster.Then,asynchronous and non blocking communication modes are used for data transmission,and the subject,predicate and object in the knowledge graph is transformed into fixed code to improve the interaction efficiency.Finally,a distributed query system based on data distribution is designed and implemented,which supports different data.In order to verify the efficiency of the distributed join algorithm based on data distribution,the traditional three kinds of distributed join algorithms are transplanted to the distributed query system based on data distribution for comparative test.Large scale synthetic data set and international standard distributed cases of query are used to improve the authority.Five scale of clusters and datasets are used to verify the scalability of query system based on data distribution.Experimental results show that the distributed join algorithm based on data distribution has higher parallelism and less time consumption than the traditional distributed join algorithm,and the distributed query system based on data distribution has good scalability.
Keywords/Search Tags:Knowledge graph, Distributed graph processing platform, Distributed query, Plan of join
PDF Full Text Request
Related items