Font Size: a A A

Research And Implementation Of Parallel Algorithms For Graph Mining

Posted on:2016-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:H X WangFull Text:PDF
GTID:2298330467992902Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the information age, there is an explosive growth of information, resulting in the increasing size of graph, and traditional graph mining algorithms can not satisfy the demand. On the one hand, parallel computing can effectively solve this problem, and cloud computing platform including Hadoop, Hama and Spark can better support parallel computing of big data. They have a wide range of applications. On the other hand, for certain kind of graph mining algorithm, under certain data size, in the end which of them is the more suitable platform for the implementation of this algorithm, and it is a. problem worth studying. In order to solve the above problems, this paper makes a study from two aspects:the cloud computing platforms and the parallelization of graph mining algorithms.For the aspect of the cloud computing platforms, this article mainly analyzes from their system architectures, the corresponding programming modes and the key technologies. The underlying mechanisms of the MapReduce, BSP and Spark programming framework are studied in principle, and on this basis, designs and realizes the graph mining algorithms in parallel.For the parallelization aspect of graph mining algorithms, in this paper graph mining algorithms are divided into three categories:graph sorting algorithm, graph clustering algorithm, graph attributes analysis algorithm. First of all, on the basis of studying the principle of algorithms, respectively implements these algorithms on the three cloud computing platforms, then sets up the experimental environment and makes performance tests, by comparing the results finds that the efficiency of programs based on Spark or Hama is higher than that based on Hadoop, at the same time, Spark shows better extensibility relative to Hama. According to the result of research, finally, this paper implements a parallel data analysis system based on Spark, the efficiency of the system is proved to be better than the traditional platform based on MapReduce.
Keywords/Search Tags:cloud computing Hadoop Hama Spark graph mining
PDF Full Text Request
Related items