Font Size: a A A

A Research Of Generation And Optimization Of Query Plan Based On Graph Database

Posted on:2022-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiFull Text:PDF
GTID:2518306524480224Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,each data platform carries a large amount of entity informations.Accurate analysis of the complex intertwined relationships behind these data helps us to infer information and do decision modeling,which also makes data scientists' demand for analysing multi-layer relationship between entities increase sharply.Graph database takes entity as node model and relationship as edge model,and use graph model to ensure the query efficiency on searching multi-layer relationships.This thesis studies graph query optimization based on the characteristics of graph language Cypher.The main innovations include the following two aspects: on the one hand,because the query pattern in Cypher statement contains query nodes and query edges,the scanning order of query nodes and the extension direction of relationships are not fixed,different combinations of query graph will result in logical plan with obvious difference in execution cost and query efficiency.On the other hand,by constructing a proper distributed physical plan of graph database,the concurrent query delay of graph database can be reduced.Well designed generation strategies for distributed graph query plan and build splitting model for graph operators.Besides,we create optimization algorithm to schedule graph operators,which is also the key to concurrent query optimization for graph database.In order to meet the needs of efficient query in graph database better,we designed a graph database system that contains query optimization and enable to produce distributed execution plan.The main contents of this research include the following points:(1)Parse Cypher query language into an AST(Abstract Syntax Tree)based on libcypher-parser framework.Build clause splitting model and create logical operators.Transform the AST into a logical plan tree.(2)When creating the logical plan tree,build an algorithm model to produce logical plans in different orders to be selected.Select the logical plan in a minimum cost by the cost-based and cardinality-based model.(3)Rewrite the logical plan based on the rewriting algorithm for logical plan and the equivalence model of graph query expression.Use different strategies to rewrite the logical plan,including pushing down logical operators,replacing logical operators,combining logical operators and pruning logical plan tree,etc.(4)Transform the logical plan into a distributed physical execution plan,which is implemented by designing corresponding distributed physical operators from logical operators.Build splitting model,scheduling strategies and concurrent computing model for physical operators to improve the efficiency of concurrent execution of the graph database system.At the end of this thesis,we build a functional test for the intermediate datasets and scheduling results of distributed plan based on this graph database,and test the optimized efficiency of query graph base on cardinality and cost algorithm by using Cypher queries with different complexity,including compilation time and running time.
Keywords/Search Tags:graph database, query optimization, cost-based, distributed query plan, planning and scheduling
PDF Full Text Request
Related items