Font Size: a A A

Frequent Subgraph Mining In Graph Databases Based On MapReduce

Posted on:2017-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2348330503972494Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of social network, large amounts of data are encoded as graphs data. Graph, with its powerful expression ability, is widely applied in chemical informatics, bioinformatics, medicine and social sciences. Frequent subgraph mining technologies are the important research techniques in these fields. Frequent subgraph mining is aim to find patterns of subgraph whoes occurrences is no less than a given support threshold. The increasing size of graph database is challenging traditional methods of frequent subgraph mining.In this paper, we propose a new approach based on MapReduce to mine frequent subgraph patterns from graph databases in large sizes. There are two rounds of MapReduce in our method. We just mine the local frequent subgraphs in each node at the first round. Then we collect the results and filter some redundant graphs to obtain a set of global candidate subgraphs. Second round of MapReduce is to calculate the global frequency of each graph in the set of global candidate subgraphs. We divide the local frequent subgraph mining into two steps, and use a embedding set to solve the problem of copy graphs. After the second round MapReduce, the k-size of global frequent subgraphs will be the input of next iteration, so that we can avoid some unnecessary computation in some nodes. It improves the efficiency of the system.We implemented our system on the Hadoop platform. There are 6 graph data sets in our experiment. And each data set is tested in three different support thresholds on 2 nodes and 4 nodes.The experimental results show that our method is similar to the linear increasing trend in the time requirement when dealing with these graph data sets. Compared with other similar algorithms, it has better generality and higher efficiency.
Keywords/Search Tags:data mining, MapReduce, parallel computing, frequent subgraph mining, big data
PDF Full Text Request
Related items