Font Size: a A A

Shortest Distance Query Algorithm Research On Large-scale Graph Data

Posted on:2020-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:W LvFull Text:PDF
GTID:2428330572499304Subject:Engineering
Abstract/Summary:PDF Full Text Request
The shortest distance query is one of the most basic operations on the graph,and is widely used in WWW,Internet,social networks,economic networks,power networks,transportation networks,and neural networks.As the scale of the graph in the application field becomes larger and larger,the traditional shortest distance algorithm cannot meet the requirements of real-time response and low memory consumption.At present,the shortest distance query method commonly used for large-scale graph data is based on the query method of landmark nodes.However,there are still some shortcomings in the existing methods of this class.First,the landmark node selection strategy cannot satisfy both centrality and efficiency.Second,the memory consumption is large.Aiming at these problems,this paper proposes a new shortest distance query algorithm based on landmark nodes.The main work and innovations of this paper are as follows:?1?For the problem that the landmark selection strategy cannot meet the centrality and efficiency at the same time,a new landmark selection strategy is proposed.First,each node in the graph is judged,and the number of nodes in the radius r is greater than the node core node of k.Then the connected backward core node and the forward core node are used as the backward landmark candidate node and the forward landmark candidate node,and finally the backward clustering and forward clustering are performed using the idea of DBSCAN[19]clustering algorithm,each time.There are no landmark candidate nodes in the cluster that are clustered into other classes as landmark nodes.?2?Aiming at the problem of large memory consumption,a new coding scheme is designed.The coding scheme uses triplet tags?type,node,dis?,which are used to indicate the type of node,the node connected to it,and the distance information..The normal node stores distance information between the backward landmark nodes in the class;the backward landmark node stores distance information of the forward landmark nodes that can be reached;and the forward landmark node stores distance information between the normal nodes in the class.?3?Propose a new query mode:traverse the backward landmark node that the source node can reach,and determine whether the forward landmark node that the backward landmark node can reach includes the target node.If not included,the two points are unreachable and the distance is positive infinity;if included,the shortest distance between the two points is calculated.The shortest distance calculation between the two points has four methods:upper bound,lower bound,mean and equal ratio.It is proved by experiments that the minimum mean of the sum of the upper and lower bounds is used as the estimated value,and the average relative error of the experimental results is lower.The theory proves that the landmark node selection strategy is reduced in time complexity and space complexity compared with the traditional method,and is suitable for large-scale graph data.At the same time,the query time complexity is reduced by an order of magnitude due to the decrease of query volume.It is proved by experiments that the radius r and k are inversely proportional to the number of backward landmark nodes and forward landmark nodes.The average relative error is the same as the traditional method error,but the storage consumption is small,indicating that the method in this paper is suitable for large-scale graphs data.
Keywords/Search Tags:large-scale, graph data, preprocess, shortest distance query
PDF Full Text Request
Related items