| Along with the rapid development of the internet,massive data is rapidly expanding and the relationships between entities are becoming more and more complex and diverse.Traditional relational databases have difficulty in querying and computing in a timely and effective manner when faced with the structure of mesh relationships such as social networks and knowledge graphs.While graph database uses the form of vertices and edges to represent entities and inter-entity relationships.So compared with the relational database,graph database has greater performance advantages in graph querying and graph computing tasks in the face of complex mesh data.As the volume of data rises,a distributed architecture is required to design the graph database system in order to meet the storage and computation needs of large volumes of data.In the face of complex graph query tasks under concurrent scenarios,the scheduling and allocation methods for distributed resources can have a significant impact on the latency of a single query and the overall throughput rate of the system.Based on a distributed graph database system architecture with storage and computation separation,this thesis proposes a computational scheduling framework to optimise query plan trees,reasonably schedule resources in parallel query scenarios,complete graph queries and graph computation tasks more efficiently,and improve query efficiency.The main works are as follows:1.This thesis designs a cardinality estimation method and query optimisation method for graph database.The query cardinality and execution cost are predicted using an unbiased sampling method for graph-oriented patterns.Based on the rule and cost model,this thesis implements arithmetic push-down,operators combination and query plan tree reconstruction to generate physical plan trees suitable for distributed scenarios.2.In this thesis,a slicing strategy is proposed for distributed operator.An operator slicing method based on data partitioning is designed.Based on the cardinality estimation results and resource conditions,dynamic operator slicing is carried out according to the execution characteristics of the operator to achieve balancing of computational resources in the system and improving the parallelism of operators.A distributed graph topology construction method for graph traversal is designed,and a parallel computation method is designed for multi-loop traversal tasks to improve the execution efficiency of graph topology construction and path computation.3.In this thesis,a distributed graph query and graph computation scheduling method is designed.A two-level scheduling strategy is adopted to construct a pipeline for the query plan tree based on operators dependencies,and through a stage-division method,a dynamic distributed collaborative scheduling of the plan tree is performed based on cardinality estimation and resource conditions to achieve high inter-operator parallelism.An operator distribution strategy based on data proximity is used to make full use of the intermediate cache to reduce network transmission latency.In testing,a complete functional and performance test of the graph database system with this framework was conducted.The test results show that the system performs well on graph query tasks,can support most query requests and has good query performance in multi-session query scenarios. |