The Design And Implementation Of Large Scale Graph Processing System Based On Improved Hadoop

Posted on:2020-08-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y P Zhang

Full Text:PDF

GTID:2370330575487082

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

"Intelligent Transportation" and "Social Network" are becoming more and more popular,but how to deal with the complex graph structure behind these areas is imminent.For example,in Intelligent Transportation,how to calculate the shortest path between two points can be abstracted into calculating the shortest path between two points in a large-scale graph,judging whether there is a social relationship between two users in a "social network" can be abstracted into calculating whether two points are reachable in a large-scale graph,and so on.With the development of "big data" and "cloud computing",the structure of graph scale is becoming more and more huge.A single computer node can not store large-scale graph structure at all.Therefore,distributed storage architecture is proposed.Graph computing is currently concentrated in two computing models,one is MapReduce model,which decomposes graph computing into map stage and reduce stage.The famous Hadoop is the distributed framework to implement MapReduce model;the other is BSP model,and the distributed framework to implement the model is Hama.By comparing the characteristics of the two models,it is found that MapReduce has higher performance.Abstraction and generality,the interface is very mature and easy to program,but it does not support explicit iteration and real-time computing.The BSP model introduces the concept of "overstep" to accelerate computing,but it requires high memory performance of computing nodes in the cluster.Therefore,this paper will combine the advantages of the two to improve Hadoop,combining the characteristics of "MapReduce model + BSP model",so that Hadoop can be an explicit iterative graphics processing distributed framework.At present,some people have improved on the first generation of Hadoop and successfully supported explicit iteration,which improves the efficiency.However,the number of users of the first generation of Hadoop is very small.The second generation of Hadoop has solved two problems of the first generation by introducing Yarn.First,the scalability problem,JobTracker's resource management function and job control function will become the bottleneck restricting system expansion;Second,JobTracker has a single point of failure,which will lead to the unavailability of the whole cluster after the problem occurs;Therefore,this paper will start to improve it on the second generation Hadoop,and on the basis of improving the Hadoop framework,this paper It realizes single source shortest path algorithm,reachable query algorithm and optimal path algorithm in time series graph,and publishes large-scale graph processing system for users to use.It will have very important value and significance for researchers of graph computing.

Keywords/Search Tags:

large scale graph, distributed, MapReduce model, BSP model, Hadoop framework

PDF Full Text Request

Related items

1	Research Of Distributed Remote Sensing Image Processing Based On Hadoop
2	Research On The Large Scale Distributed Hydrologic Model
3	Incremental Parallel Graph Query For Large Graph Data
4	Research On Several Distributed Graph Processing Algorithms And A Unified Graph Programming Framework
5	Research On Large-scale Graph Data Similarity Query And Classification Techniques
6	Could Computing Model Base On Hadoop And Meteorological Application
7	Community Structure Detection Based Subsystem Decomposition And Distributed Model Predictive Control Of Large Scale System
8	Research Of Land Use Change Prediction Based On CA-Markov Model In Hadoop Environment
9	Research On The Thunderstorm Data Clustering And Thunderstorm Prediction Model Based On The Hadoop Platform
10	A Remote Sensing Products Processing System Based On Hadoop