Font Size: a A A

Design And Implementation Of Critical Technologies In Distributed Graph Database

Posted on:2021-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LiFull Text:PDF
GTID:2428330623968542Subject:Engineering
Abstract/Summary:PDF Full Text Request
The current Internet system is generating massive amounts of unstructured and interconnected data.These data often do not have a unified format due to the continuous evolution of the system,and the types of objects are ever-changing,and the relationships between objects are complex and changeable.Graph database,as a new type of database specifically designed to store graph data,can be well adapted to this data application scenario.It is a current research hotspot.The current difficulties and bottlenecks in the development of graph databases are mainly concentrated in two aspects.First,how to design a database based on native graph storage,instead of simply encapsulating semantics on the basis of other types of database storage engines.If the proximity and relational query of graph data cannot be considered in the storage layer,its performance will not necessarily be substantially improved.Second,how to provide a computing layer with rich graph query capabilities,which can provide users with the ability to query the database based on the idea of graphs.Other types of query languages are inherently deficient in the expression of graph data queries,and are not sufficient for querying graph databases.This article proposes a high-performance distributed graph database architecture to try to solve the following problems:1.How to efficiently index points and edges based on the proximity of graph data in a distributed scenario.In this thesis,a Hash-based data distribution algorithm is used to divide the graph into multiple shards,and the index of points and edges is accelerated by means of a specially designed data storage format and physically selected similar location storage methods.2.How to implement a high-performance and scalable(deployment and implementation)computing layer framework in a distributed scenario.This thesis uses an additional abstraction layer and table-based data abstraction to ensure that all operators have a common event-based high-performance asynchronous programming environment to simplify their development and design.3.How to design and implement operators in the computation layer so that they can efficiently express and execute Cypher graph query language.This thesis proposes a dedicated algorithm for generating logical execution plans and physical execution plans,as well as an operator scheduling algorithm in a distributed system,so that two-level relational queries of millions of nodes can be completed in 100 ms.This thesis also gives the detailed implementation details and test report of the calculation layer in the distributed graph database architecture,but the implementation of the storage layer is not given due to space and project division.The detailed implementation of the storage layer will be given in the form of other thesis.
Keywords/Search Tags:graph database, database, distributed system, non-relational database, high performance
PDF Full Text Request
Related items