Font Size: a A A

Hybrid Graph Query And Graph Computing Engine For Distributed Graph Database

Posted on:2022-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:C H LiuFull Text:PDF
GTID:2518306524980219Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid development of the internet has penetrated into various industries.Massive unstructured data is gradually increasing.It has generated a large number of requirements to analyze the relationship between data,such as knowledge mapping,social networks and other fields.The graph system has great advantages in processing massive associated data compared with the traditional relational database and the traditional big data processing system.Graph system can be divided into two types.One is graph database,which only query part of the graph data.It requires fast response and low latency to return query results in graph query.The other is graph computing system,which usually accesses the whole graph and iterates for many times.And it has a huge amount of computation and takes a long time.In the current research,most researchers study graph database and graph computing system separately.However,graph query and graph computing exist at the same time in practical application.They have a lot of common characteristics,such as graph storage,graph partition,graph index and so on.Aiming at the above problems,this thesis designs and implements a hybrid graph system engine for graph query and graph computing—HCQ-GDB.This thesis discusses a series of problems such as the waste of graph storage space,the cost of data transmission,the cost of maintaining data consistency,the utilization of cache and index.The graph computing combined with graph query can complete more advanced and complex computing tasks.The main contents and innovations of this thesis are as follows:1.the unification of execution mode of graph query and graph computing,the overall design of hybrid graph system engine: In order to unify the execution mode of graph query and graph computing and make efficient use of system resources,a series of distributed operators are designed to transform their computing logic into DAG(directed acyclic graph)physical execution plan for task scheduling.The overall architecture design of the hybrid graph system is given.2.graph storage model and cache scheme: This thesis designes a graph storage model for the complex unstructured graph data.The graph storage model is conducive to reduce space cost and accelerate graph query and computing.Because of the different IO requests of graph query and graph computing,a cache scheme is proposed to meet the data reading and writing requirements of graph query and graph computing.3.improvement of graph partition algorithm and task scheduling optimization model: In the distributed environment,in order to reduce the network communication overhead between tasks and based on the principle of data proximity,the optimized graph partition algorithm is implemented in this thesis.Due to the time delay sensitivity of graph query and the large amount of computation of graph computing,a distributed task scheduling optimization model is implemented to improve the overall system performance.4.graph computing synchronous and asynchronous hybrid model: Different graph algorithms will have different data consistency,algorithm convergence and execution cost due to synchronous and asynchronous execution.On the premise of ensuring data consistency,a synchronous and asynchronous hybrid model is designed to improve the graph computing performance of this system.In the test,this thesis conducts a complete benchmark function test and benchmark performance test on the hybrid graph system engine.The results of function test show that the system can support most of the graph query requests and graph computing requests.The related algorithms and core technologies can be executed normally and get the correct results.Compared with other graph database and graph computing system,the performance is improved obviously.
Keywords/Search Tags:distributed graph query, distributed graph computing, graph partition, graph storage, scheduling optimization
PDF Full Text Request
Related items