Research And Implementation Of Parallel Algorithms For Graph Mining

Posted on:2016-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:H X Wang

Full Text:PDF

GTID:2298330467992902

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the advent of the information age, there is an explosive growth of information, resulting in the increasing size of graph, and traditional graph mining algorithms can not satisfy the demand. On the one hand, parallel computing can effectively solve this problem, and cloud computing platform including Hadoop, Hama and Spark can better support parallel computing of big data. They have a wide range of applications. On the other hand, for certain kind of graph mining algorithm, under certain data size, in the end which of them is the more suitable platform for the implementation of this algorithm, and it is a. problem worth studying. In order to solve the above problems, this paper makes a study from two aspects:the cloud computing platforms and the parallelization of graph mining algorithms.For the aspect of the cloud computing platforms, this article mainly analyzes from their system architectures, the corresponding programming modes and the key technologies. The underlying mechanisms of the MapReduce, BSP and Spark programming framework are studied in principle, and on this basis, designs and realizes the graph mining algorithms in parallel.For the parallelization aspect of graph mining algorithms, in this paper graph mining algorithms are divided into three categories:graph sorting algorithm, graph clustering algorithm, graph attributes analysis algorithm. First of all, on the basis of studying the principle of algorithms, respectively implements these algorithms on the three cloud computing platforms, then sets up the experimental environment and makes performance tests, by comparing the results finds that the efficiency of programs based on Spark or Hama is higher than that based on Hadoop, at the same time, Spark shows better extensibility relative to Hama. According to the result of research, finally, this paper implements a parallel data analysis system based on Spark, the efficiency of the system is proved to be better than the traditional platform based on MapReduce.

Keywords/Search Tags:

cloud computing Hadoop Hama Spark graph mining

PDF Full Text Request

Related items

1	Parallel Algorithms Research Based On Hadoop And Hama
2	Research And Implementation Of Graph Data Processing Technology Based On Cloud Computing Environment
3	Study On Performance Of Hama Computing Platform
4	Research On Web Data Mining Algorithms In Cloud Computing Environment
5	Parallel Data Mining Algorithm Research In Cloud
6	Research And Design Of Data Mining System For Tcm Disease Based On Cloud Computing Environment
7	The Research And Implementation Of Bayesian Classification Algorithm In The Text Based On Spark Platform
8	The Design And Implementation Of Data Mining System On Yarn
9	Study On Parallel Alogrithm Of Large-scale Numerical Calculation In Cloud Computing Environment
10	The Application Research Of Weblog Mining Based On Cloud Computing