Font Size: a A A

Research And Implementation Of Big Data Processing Algorithm Based On Parallel Graph Partitioning

Posted on:2019-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:G Q WangFull Text:PDF
GTID:2428330548969388Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the increasing prosperity of the Internet,the data on the Internet is getting bigger and bigger,and the data becomes more and more complicated.How to obtain useful information in huge amounts of data has become a very urgent issue.In response to massive data processing and analysis,distributed computing platform came into being.Put data on the distributed platform for storage,so that you can solve the problem of limited storage capacity of a single machine,while another advantage of the distributed platform is mobile computing,data is divided into multiple machines on the storage,the use of this machine Computing power to handle the data stored locally,the cluster parallel processing of data,thereby accelerating the processing of large amounts of data.Graph data calculation is one of the core problems in big data processing.Balance graph division is a difficult point in graph calculation.Balance chart division in social networks,biological networks,road networks,etc.have a very wide range of applications.In the past research,there are a lot of algorithms for graph demarcation,most of them need to traverse the whole graph.The distributed system will divide the data and store it on more than one machine.Traversing the whole graph will certainly increase the data The cost of communication,and traversing the whole map data is impractical for the current large-scale data.In 2013,Rahimian published the graph partitioning algorithm as a processing unit with a single vertex.In the calculation,it only needs to visit the vertex's immediate neighbor vertices and a few random vertices,and does not need full-graph access.The communication cost between each machine in the cluster Is undoubtedly greatly reduced,thereby enhancing the algorithm's processing power.A good graph partitioning algorithm combined with a more usable distributed platform has undoubtedly greatly improved the practicality of the algorithm.The new programming framework implements the algorithm flow and cannot use the ideas of the original algorithm.It must be redesigned according to the new computing framework.There will be memory problems,algorithmic flow design and so on.According to the attributes of the graph,the relationship discovery ability is added,and the ability of the relationship discovery in the division results obtained by the algorithm rises.On the premise of ensuring that the graph partitioning algorithm can get a good cut edge,this paper proposes a parallel graph partitioning algorithm based on MapReduce programming model,and adds the calculation of the ability of the subgraph relations in the partitioning.At the end of this paper,we evaluate the speedup,scalability and scalability of the parallel algorithm by using different data sets and parallel algorithm performance evaluation methods,and verify the effectiveness and accuracy of the algorithm.
Keywords/Search Tags:Big data, Graph partition, Cut edge, Parallel graph partitioning algorithm
PDF Full Text Request
Related items