Research On Algorithms For Frequent Subgraph Mining Based On Large Graph Databases

Posted on:2011-07-29

Degree:Master

Type:Thesis

Country:China

Candidate:C Zheng

Full Text:PDF

GTID:2178330338991327

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

The frequent subgraph mining is a technology of obtaining frequent subgraph patterns from graph databases, the result can be used in the classification and clustering research of the graph databases. It is useful for client to study the feature of graph sets and build search index of graph databases.At present, the existing graph mining algorithms typically assume that the graph databases can fit into main memory. As many large graph databases cannot satisfy this condition, when faced with this situation,the effective method is partitioning large graph databases to many small graph databases which can fit into main memory and then using existing graph mining algorithms for these fragments.However, this method has the problems of producing redundancy subgraphs and repeat scanning the graph databases, to solve these problems, this paper provides an efficient frequent subgraph mining algorithm and uses it to large graph databases mining.Firstly, to resolve the problem of algorithm gSpan produces redundancy subgraph when extending frequent subgraph,we present an improved algorithm CSGM using ADI++ storage structure,which can handle larger graph databases and guarantee that each extension is a canonical code.It not only avoids calculating the support of the non-canonical code graphs but also avoids the calculation of whether an input code is a canonical code,the new algorithm reduces the computation.Secondly, to resolve the problem of PartGraphMining algorithm,a large graph databases mining algorithm , needs to repeat scan the database when handling large graph databases ,we present an improved algorithm IPGM. Through using CSGM algorithm to mine the frequent subgraph after being partitioned, meanwhile using hash table to store graph's hash address, it can avoid repeat scanning the databases and reduce the number of judging subgraph isomorphismFinally,the experiments verify the correctness, the efficiency and the ability of using the CSGM algorithm and IPGM algorithm to handle large graph databases .

Keywords/Search Tags:

Frequent subgraph, large graph databases, ADI++ storage structure, Minimum DFS code, Repeat scan

PDF Full Text Request

Related items

1	Mining Frequent Subgraph Based On Pre-clipping In Uncertain Graph Databases
2	Research And Application Of Mining Graph Structure Algorithm
3	Research On Key Techniques Of Large Scale Graph Mining Based On GPU Hybrid Architecture
4	Mining Algorithm Based On Frequent Sub-graph Of The Multi-layer Index Structure
5	Research On Exact Subgraph Search Technology In Graph Database
6	Research Of Query Technologies On Graph Data
7	An Efficient Algorithm Of Mining Frequent Subgraph Patterns In Uncertain Graph Database
8	Frequent Subgraph Mining In Graph Databases Based On MapReduce
9	Frequent Subgraph Mining And Its Application In Compound Property Prediction
10	Frequent Subgraph Mining Based On Partially Labeled Graph