Frequent Subgraph Mining In Graph Databases Based On MapReduce

Posted on:2017-08-02

Degree:Master

Type:Thesis

Country:China

Candidate:K Wang

Full Text:PDF

GTID:2348330503972494

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, with the rapid development of social network, large amounts of data are encoded as graphs data. Graph, with its powerful expression ability, is widely applied in chemical informatics, bioinformatics, medicine and social sciences. Frequent subgraph mining technologies are the important research techniques in these fields. Frequent subgraph mining is aim to find patterns of subgraph whoes occurrences is no less than a given support threshold. The increasing size of graph database is challenging traditional methods of frequent subgraph mining.In this paper, we propose a new approach based on MapReduce to mine frequent subgraph patterns from graph databases in large sizes. There are two rounds of MapReduce in our method. We just mine the local frequent subgraphs in each node at the first round. Then we collect the results and filter some redundant graphs to obtain a set of global candidate subgraphs. Second round of MapReduce is to calculate the global frequency of each graph in the set of global candidate subgraphs. We divide the local frequent subgraph mining into two steps, and use a embedding set to solve the problem of copy graphs. After the second round MapReduce, the k-size of global frequent subgraphs will be the input of next iteration, so that we can avoid some unnecessary computation in some nodes. It improves the efficiency of the system.We implemented our system on the Hadoop platform. There are 6 graph data sets in our experiment. And each data set is tested in three different support thresholds on 2 nodes and 4 nodes.The experimental results show that our method is similar to the linear increasing trend in the time requirement when dealing with these graph data sets. Compared with other similar algorithms, it has better generality and higher efficiency.

Keywords/Search Tags:

data mining, MapReduce, parallel computing, frequent subgraph mining, big data

PDF Full Text Request

Related items

1	Research Of Efficient Frequent Subgraph Mining On Big Graph
2	Parallel Frequent Itemset Mining Based On MapReduce
3	Research On Differentially Private Frequent Subgraph Mining
4	Frequent Subgraph Mining And Its Application In Compound Property Prediction
5	Research And Application Of Mining Graph Structure Algorithm
6	MapReduce-based Parallel Data Mining Services For TCM
7	Algorithms For Mining Frequent Subgraphs And Its Applications In Biological Network
8	Research And Application Of Frequent Subgraph Mining Algorithm
9	Study And Implementation On Techniques Of Parallel Mining Of Frequent Closed Sequences Based On Vertical Segmentation
10	Research On Algorithms For Frequent Subgraph Mining