Font Size: a A A

Design And Implementation Of A Graph Traversal Framework For Distributed Graph Storage

Posted on:2022-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhengFull Text:PDF
GTID:2518306764980189Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
The database industry has encountered significant changes in just a few decades,from the introduction of the first database management system,Integrated Data Store(IDS),to the current blossoming of the database market,which owes to the booming PC and mobile device markets,as well as the information explosion in the twenty-first century.In recent years,with the newly data growth,the proportion of various unstructured data is increasing,the net shape of Internet information data is becoming more and more clear,and the mining of various valuable information under the net structure has become the focus of attention.Graph databases,in comparison to relational databases,have a more accurate representation of the real world and more efficient correlation queries,giving them a significant edge in mining deep correlation data.With the increasing volume of data,graph databases with distributed storage will become more common in the future.Distributed storage transforms single-machine graph traversal into distributed graph traversal,introducing issues such as subgraph traversal leaping.The current basic solution for graph traversal in distributed environments is based on various big data platforms,which are characterized by partitioning with data sources and following a common set of execution patterns,so there is an unavoidable problem of additional data transformation overhead and difficulty in optimizing graph traversal scenarios.For that matter,this thesis designs and implements a Graph Traverse Framework(GTF)for distributed graph storage,which is mainly oriented to the scenario of chain graph traversal in the graph query function of distributed graph database.The main work and innovations of this thesis are as follows:1)This thesis discusses the classification of current graph databases,analyzes the current mainstream graph databases and the advantages and disadvantages of each,and proposes GTF and completes the overall design to address the shortcomings of current distributed graph databases in the direction of chain graph traversal.2)This thesis presents an asynchronous execution paradigm for GTF as well as a tree management method in this work.The asynchronous execution mode employs a method,which is similar to flooding route,to ensure that individual node execution does not conflict with one another,allowing individual nodes to traverse without waiting for global communication at each overstep.The tree management method is a task management strategy that uses a tree structure to balance the system's network load during distributed graph traversal with nearly no increase in network cost.3)This thesis proposes and implements two optimization approaches based GTF: data distribution and delayed forwarding.Data distribution can effectively reduce data migration between compute layer nodes;Delayed forwarding reduces cluster network overhead in the middle and late stages of a distributed traversal.4)This thesis carries out basic implementation of GTF and analyses the test result.Tests have shown that GTF is capable of performing basic functions and outperforms other databases in hop-path query.
Keywords/Search Tags:Graph database, Graph traverse framework, Asynchronous execution, Tree Management
PDF Full Text Request
Related items