Font Size: a A A

The Design And Implementation Of Large Scale Graph Processing System Based On Improved Hadoop

Posted on:2020-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y P ZhangFull Text:PDF
GTID:2370330575487082Subject:Software engineering
Abstract/Summary:PDF Full Text Request
"Intelligent Transportation" and "Social Network" are becoming more and more popular,but how to deal with the complex graph structure behind these areas is imminent.For example,in Intelligent Transportation,how to calculate the shortest path between two points can be abstracted into calculating the shortest path between two points in a large-scale graph,judging whether there is a social relationship between two users in a "social network" can be abstracted into calculating whether two points are reachable in a large-scale graph,and so on.With the development of "big data" and "cloud computing",the structure of graph scale is becoming more and more huge.A single computer node can not store large-scale graph structure at all.Therefore,distributed storage architecture is proposed.Graph computing is currently concentrated in two computing models,one is MapReduce model,which decomposes graph computing into map stage and reduce stage.The famous Hadoop is the distributed framework to implement MapReduce model;the other is BSP model,and the distributed framework to implement the model is Hama.By comparing the characteristics of the two models,it is found that MapReduce has higher performance.Abstraction and generality,the interface is very mature and easy to program,but it does not support explicit iteration and real-time computing.The BSP model introduces the concept of "overstep" to accelerate computing,but it requires high memory performance of computing nodes in the cluster.Therefore,this paper will combine the advantages of the two to improve Hadoop,combining the characteristics of "MapReduce model + BSP model",so that Hadoop can be an explicit iterative graphics processing distributed framework.At present,some people have improved on the first generation of Hadoop and successfully supported explicit iteration,which improves the efficiency.However,the number of users of the first generation of Hadoop is very small.The second generation of Hadoop has solved two problems of the first generation by introducing Yarn.First,the scalability problem,JobTracker's resource management function and job control function will become the bottleneck restricting system expansion;Second,JobTracker has a single point of failure,which will lead to the unavailability of the whole cluster after the problem occurs;Therefore,this paper will start to improve it on the second generation Hadoop,and on the basis of improving the Hadoop framework,this paper It realizes single source shortest path algorithm,reachable query algorithm and optimal path algorithm in time series graph,and publishes large-scale graph processing system for users to use.It will have very important value and significance for researchers of graph computing.
Keywords/Search Tags:large scale graph, distributed, MapReduce model, BSP model, Hadoop framework
PDF Full Text Request
Related items