Research On A Distributed Graph Data Process Mechanism Based On Spark

Posted on:2018-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:T Q Yang

Full Text:PDF

GTID:2428330518958876

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Compared with the linear table structure and tree structure,graph data structure in the real world has better ability of expression,both on the structure and expression.Even graph data structure have stronger power to express some situation of real life,so the study of graph data is very important.Graph data processing has a great challenge,due to the real world entity expansion and mass application with huge amounts of data.Most of traditional graph representation used adjacency matrix.This method can intuitively express the figure structure and be easy to calculate.However,with the increasing of the data volume dramatically,the traditional method of graph representation appears very low efficiency on the storage and the computation.In the system of graph processing,a lot of mass data processing system was emerged in the ear of great growth in data repression and a diverse array of data processing requirements,but these systems exist some problems which memory utilization rate is extremely low,reusing the intermediate results and high disk I/O access.Focus on the large-scale graph data processing problems,we proposes a efficient graph representation method based on vector and Spark mass graph processing mechanism.This paper also focus on the efficient of node vector learning process.In this premise which as far as possible to reduce the memory overhead and a reasonable cost of calculation,we map each node in the graph structure into a low dimensional vector by training the original graph structure,and avoid the shortcoming of structure of the matrix,at the same time,the computation also doesn't significantly increase in this model.Aiming at how to deal with large-scale graph data,we propose a method which for distributed computing framework based on memory Apache Spark for segmenting the graph system optimization.And considering the high I/O problems in the complex relationship of processing of the framework,we increased the graph data cache layer structure,which based on the memory,between computing and storage layer.Finally,in the experiments,we compared Skip-gramm and SVD algorithm,and prove that the proposed approach in memory consumption and figure characterization accuracy is better than the other two algorithms.

Keywords/Search Tags:

graph data processing, graph computing, Spark, distributed computation

PDF Full Text Request

Related items

1	The Research Of Graph Computing Framework Supporting Spatial And Temporal Data Management Based On Spark
2	SimRank Computation On Large Graphs Based On Spark
3	Cluster Based Large-scale Distributed Graph Processing System
4	Hybrid Graph Query And Graph Computing Engine For Distributed Graph Database
5	Performance Optimization Of Distributed Graph Computation Framework Based On BSP Model
6	Efficient Algorithm For Mining Dense Subgraphs In Uncertain Graph
7	Design And Implementation Of Distributed Graph Computing Engine
8	Research On Balanced Graph Partitioning Based On Vertex Cut And Its Implementation On Spark
9	Graph Reachability Distributed Computing And Application Based On Spark
10	Research On Performance Optimization For Distributed Graph Computation