Font Size: a A A

The Research Of Graph Computing Framework Supporting Spatial And Temporal Data Management Based On Spark

Posted on:2019-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:C Y WangFull Text:PDF
GTID:2428330545453685Subject:Computer science and technology
Abstract/Summary:PDF Full Text Request
With the continuous development of social media,network communication generates massive amounts of data every moment.With the exponential growth of massive data,"big data" has become an upsurge of information technology.For large amounts of data generated by mobile social networking devices,graphs are the best and most intuitive tool for analyzing the relationships reflected in the data.Therefore,the use of spatial-temporal graphs for data analysis has gradually attracted the attention of many academic researchers in recent years.However,in many practical life scenarios,we only need to analyze a part of the data.In other words,for a given query request,the spatial-temporal graphs needs to return a sub-graph based on the query conditions.For example,in many cases such as the New York Big Bang survey,the government only needs to analyze the data in the local area of New York City to investigate the relevant data instead of the entire New York City data.This kind of life scene urges us to study the construction of fast spatial-temporal sub-graphs.For the construction and analysis of such spatial-temporal sub-graphs,the existing distributed graph computing frameworks will perform traversal and retrieval of all the graph data,and then conduct a complete comparison of the data to search for the required sub-graph,which is wasteful to compare with all the data.At the same time,the actual application data has a local principle in terms of spatial position,that is,near-edge communication is frequent.Combining these two problems,we propose two optimized distributed graph computing frameworks,SpatialGraphx and GeoGraphx.The SpatialGraphx model is based on the Graphx model.The GeoGraphx model is optimized by SpatialGraphx.Leveraging the spatial and temporal attributions of data,SpatialGraphx presents two extensions in the partial graph construction by building a spatio-temporal tree index and the computation by a new location-based partition strategy.To test the performance of the SpatialGraphx model,we use hundreds of millions of edges of mobile call data for experiments.This model can effectively support fast spatiotemporal subgraph construction and spatiotemporal subgraph analysis.And compared to original Graphx,the improvement of SpatialGraphx is 3x to several orders of magnitude for large enough dataset.With the later in-depth research,in this paper we also propose a new GeoGraphx model to optimize the SpatialGraphx in the presence of defects,such as data management's load imbalance and query operation's load imbalance.The GeoGraphx model mainly proposes two new viewpoints:one is an optimized data management mechanism with quad-tree index,which can balance the load of each node in the cluster and can maximize the degree of parallelism of each node in sub-graph query operation,and the second is the addition of graph operation APIs.Through experiments,GeoGraphx performs better than both SpatialGraphx and Graphx.
Keywords/Search Tags:Spark, Graph Computation, Distribution, Partial Graph
PDF Full Text Request
Related items