Font Size: a A A

Large Scale Multi-graph K Hop-constrained Path Enumeration

Posted on:2021-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:2480306050970529Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information technology,more and more data captured in the form of graph.The graph data structure applied in many areas such as social networks,bioinformatics,and e-commerce.As a special data structure,graph data has many distinctive features,such as large data size,complex topological structure,and high computational complexity.Naturally,the query processing on the graph is much more difficult than traditional data types(e.g.,relational data and XML data,etc.).One of the fundamental tasks in graph analysis is to investigate the relations between two nodes in the graph such as how a node A influences the other node B,or the interaction chain between two substances.Therefore,one of the key ideas to solve the above problem is the path enumeration problem between two nodes.For this purpose,we study the problem of k hop-constrained s-t path enumeration in this thesis,which aims to list all simple paths from a source node s to a target node t within k hops.The main obstacle to solving the k hop-constrained s-t path enumeration problem is the huge result scale even for a small k,and this is also a resource-consuming problem.It is less efficient in terms of response time and memory usage if we enumerate all s-t paths with duplicates and loops.For this purpose,in order to solve this unacceptable disadvantage,this thesis proposes the relation optimization in path enumeration on multi-graph and two kind of k hop-constrained s-t path enumeration algorithm,namely P-DFS and HPP-DFS.The relation optimization in s-t path enumeration on multi-graph aims to reduce the complexity of the graph.This thesis organizes large-scale graph data reasonably.For given nodess and t,the number of the hop constrain k,we further reduce the graph by delete some nodes and edges earlier that are unlikely to be appeared in the result set.In this way,the IO overhead and search space during the path enumeration are greatly shortened.P-DFS is a Depth First Search based algorithm.it can enumerate all the s-t paths within k hops.The general idea of this approach is "terminate the program early as soon as possible".In order to reduce the number of recursive queries of which explorer fruitless branches,P-DFS using pruning approach to avoid the useless attempt according to the shortest path information.The key idea of HPP-DFS is "use the previous query results to avoid redundant calculations".In the real world,the degree distribution of the nodes in the graph is uneven,that is,some of the nodes have a very large out-degree(in-degree),on the contrary,some have a small out-degree(in-degree).And,P-DFS takes s-t paths one by one during enumeration,the path information that has been calculated in the previous stage is not efficiently used.During the enumeration process,if such a node with a large degree is encountered,the number of edges we will be accessed increase greatly.Inevitably,there will be a large number of redundant calculations.Therefore,we take the approach of sharing the results of some intermediate paths contained hot-point that already have been calculated before when designing HPP-DFS,and the response time is further shortened by this way.Theoretically,P-DFS and HPP-DFS are polynomial delay algorithms with 0(km)time complexity.In this thesis,our comprehensive experiments on real-life networks demonstrate the superior performance of the P-DFS and HPP-DFS algorithm compared to the state-of-the-art techniques based on the relation optimization in path enumeration on multi-graphs as mentioned before.It is also reported that the two algorithms have high feasibility and high accuracy.
Keywords/Search Tags:Graph Analysis, Graph Reduction, s-t Path, Hop Point, Path Enumeration
PDF Full Text Request
Related items