Font Size: a A A

Graph Reachability Distributed Computing And Application Based On Spark

Posted on:2017-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:C L JiangFull Text:PDF
GTID:2348330503972507Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Reachability query has been widely used in many fields as one of the common operations on graphs. With the great increase of web data, the scale of graphs expands quickly, which makes the reachability query computing on large-scale graphs becomes a new challenge. Spark is a distributed computing framework proposed in recent years.Based on the feature of memory-based computing, Spark has natural advantages on interactive and iterative calculations, which is good for the development of distributed reachability query on large-scale graphs.A parallel reachability computing method on the Spark platform is designed.Before reachability query, graph data is transformed into a property graph, where the strongly connected components are extracted and the directed acyclic graphs are formed. Then, the computing algorithms of reachability query as well as the shortest reachability path are designed. Taking advantage of the parallelism and caching of resilient distributed datasets on the distributed platforms, the reachability computing method is optimized. Such distributed computing algorithm is applied to the social network analysis scenarios. First, the social network data is transformed into graph data. Second, the hotpots query method on the friend relation computing is proposed.Then, a reasonable recommended route is provided. At last, the visualization techniques is implemented to draw the relationship between friends in a graph.Three experiments are designed with different scales of graphs. The first comparing experiment on spark cluster and stand-alone platform illustrates the performance advantages of spark cluster on millions of graph data. The second experiment shows the effects of average degree on reachability query. The third experiment proves that the hotpots query method can improve the query speed.
Keywords/Search Tags:distributed computing method, reachability query, graph parallel computing, social netwok analysis
PDF Full Text Request
Related items