Font Size: a A A

Distributed Graph Database Storage Layer Design And Implementation

Posted on:2022-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2518306524989669Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rise and development of knowledge graph and graph computing,the processing and analysis of ultra-large scale graph data has gradually become a hot issue in the industry.Unlike traditional relational data,graph data represent entities and the relationships between entities through vertices and edges,thus showing a mesh structure from point to surface.In the processing of graph data,it often starts from one vertex and iterates to the surrounding vertices and edges.In order to support the storage of ultra-large-scale graph data and its related data processing,distributed graph databases are born.In existing distributed graph databases,the architecture design of separation of computation and storage is usually adopted,where the upper computation layer is responsible for querying and computing graph data,and the lower storage layer is responsible for reliable storage of data and some basic graph query arithmetic.In the process of graph data processing,graph traversal is the most common operation,which often generates a large number of intermediate results and frequently interacts between the computation and storage layers,thus causing additional network overhead and affecting the processing performance of the graph database.Therefore,for the graph database storage layer,it is important to study how to design and implement native graph storage in a distributed environment to reduce the intermediate results generated in the graph data traversal process to improve the performance of the whole distributed graph database.To solve the above problems,the main work done in this thesis is as follows:1.Design and implement the organization and storage of the native graph data.Under the premise of graph partitioning algorithm with edge partitioning,the cut graph data is stored as subgraphs for native graph storage,and a native graph storage method with good performance is realized by designing and storing graph topology data and graph attribute data for their different characteristics in the graph traversal process;2.Optimization is proposed for the existing graph data traversal method.In order to solve the problem that the intermediate results are frequently interacted between the computation layer and the storage layer in the process of multi-layer graph traversal in the existing graph database,an arithmetic sink and intermediate result caching mechanism is designed and implemented in the storage layer,so that the intermediate results of traversal do not need to be interacted between the computation layer and the storage layer,but the entire graph data computation query task is advanced through control messages,thus achieving the goal of reducing network overhead and improving query performance;3.design and implement distributed storage of graphs.Consider the proximity of graph data and data load balancing in distributed clusters,and ensure the proximity of graphs to improve the overall performance of the system under the premise of high availability of clusters as much as possible.Designing and implementing multi-copy and consistency methods for distributed graph databases,so that the whole cluster can be self-aware and adjust when the cluster has load imbalance;This thesis also gives a test report of the storage layer of the distributed graph database,which shows that the performance of graph traversal is significantly improved compared with the existing graph database under the premise of ensuring distributed and reliable storage.
Keywords/Search Tags:graph database, native graph storage, graph traversal, distributed graph storage
PDF Full Text Request
Related items