Font Size: a A A

Research On A Distributed Graph Data Process Mechanism Based On Spark

Posted on:2018-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:T Q YangFull Text:PDF
GTID:2428330518958876Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Compared with the linear table structure and tree structure,graph data structure in the real world has better ability of expression,both on the structure and expression.Even graph data structure have stronger power to express some situation of real life,so the study of graph data is very important.Graph data processing has a great challenge,due to the real world entity expansion and mass application with huge amounts of data.Most of traditional graph representation used adjacency matrix.This method can intuitively express the figure structure and be easy to calculate.However,with the increasing of the data volume dramatically,the traditional method of graph representation appears very low efficiency on the storage and the computation.In the system of graph processing,a lot of mass data processing system was emerged in the ear of great growth in data repression and a diverse array of data processing requirements,but these systems exist some problems which memory utilization rate is extremely low,reusing the intermediate results and high disk I/O access.Focus on the large-scale graph data processing problems,we proposes a efficient graph representation method based on vector and Spark mass graph processing mechanism.This paper also focus on the efficient of node vector learning process.In this premise which as far as possible to reduce the memory overhead and a reasonable cost of calculation,we map each node in the graph structure into a low dimensional vector by training the original graph structure,and avoid the shortcoming of structure of the matrix,at the same time,the computation also doesn't significantly increase in this model.Aiming at how to deal with large-scale graph data,we propose a method which for distributed computing framework based on memory Apache Spark for segmenting the graph system optimization.And considering the high I/O problems in the complex relationship of processing of the framework,we increased the graph data cache layer structure,which based on the memory,between computing and storage layer.Finally,in the experiments,we compared Skip-gramm and SVD algorithm,and prove that the proposed approach in memory consumption and figure characterization accuracy is better than the other two algorithms.
Keywords/Search Tags:graph data processing, graph computing, Spark, distributed computation
PDF Full Text Request
Related items