Font Size: a A A

Research On Distance Computing In Distributed Graph Data Management System

Posted on:2020-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2370330623951434Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology and economy,knowledge graph has emerged quietly in recent years,and has been widely used in various important fields.In these knowledge graph applications,the search of the shortest path always plays a cornerstone role and has a very important significance.It is an inevitable problem in graph calculation.With the continuous growth of data scale in graph,many graph data sets have reached the scale of millions,and will become larger and larger in the foreseeable future.However,the computing power of computers has not increased at the same rate,resulting in a very long response time to the shortest path distance calculation under a single server,which makes it difficult to adapt to the high concurrent network environment.Therefore,we urgently need to find a new algorithm which can adapt to the distributed environment to reduce the response time,instead of the traditional Dijkstra algorithm.In order to solve this challenging problem,we simplify the shortest path calculation to a bounded estimation problem,that is,to calculate the upper or lower bounds of the distance between two points to obtain the estimated distance.In recent years,many foreign researchers have used the framework of "landmark selection" and shortest path tree to calculate the shortest path,which sacrifices some accuracy and achieves higher response speed.However,these methods can only adapt to a certain type of graph,and have no good universality.On the other hand,they do not perform well in obtaining the lower bound of distance,but calculating the distance between two points through the lower bound can make better use of the nature of the shortest path tree and obtain more accurate results.Moreover,the estimation of the lower bound of the shortest path distance is of great practical significance in both academic and industrial circles.In addition,we also apply the algorithm in the distributed environment to avoid the problems of poor read-write performance and slow computing speed of single-machine memory.This topic comes from my teacher's advice and my personal interest.The main work of the article includes:(1)By consulting and researching related technologies,a landmark selection framework based on lower bound and an evaluation criterion for landmark are proposed,which can make good use of the nature of the shortest pathtree.(2)A heuristic landmark selection algorithm is proposed to select the optimal landmark set.(3)We implement our algorithm through distributed graph processing system,which proves that our algorithm has good horizontal scalability.(4)In order to ensure the universality of our method,we have done a lot of experiments on several million-level real graph datasets of different types.The results of the experiment show that the proposed algorithm can efficiently respond to user's shortest path distance queries online in milliseconds,and can obtain better accuracy than other similar algorithms,and maintain good stability on different types of data sets.
Keywords/Search Tags:Graph Calculation, Knowledge Graph, Shortest Path Distance, Distributed Graph Processing, Landmark
PDF Full Text Request
Related items