Font Size: a A A

Research On Algorithms For Frequent Subgraph Mining Based On Large Graph Databases

Posted on:2011-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhengFull Text:PDF
GTID:2178330338991327Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The frequent subgraph mining is a technology of obtaining frequent subgraph patterns from graph databases, the result can be used in the classification and clustering research of the graph databases. It is useful for client to study the feature of graph sets and build search index of graph databases.At present, the existing graph mining algorithms typically assume that the graph databases can fit into main memory. As many large graph databases cannot satisfy this condition, when faced with this situation,the effective method is partitioning large graph databases to many small graph databases which can fit into main memory and then using existing graph mining algorithms for these fragments.However, this method has the problems of producing redundancy subgraphs and repeat scanning the graph databases, to solve these problems, this paper provides an efficient frequent subgraph mining algorithm and uses it to large graph databases mining.Firstly, to resolve the problem of algorithm gSpan produces redundancy subgraph when extending frequent subgraph,we present an improved algorithm CSGM using ADI++ storage structure,which can handle larger graph databases and guarantee that each extension is a canonical code.It not only avoids calculating the support of the non-canonical code graphs but also avoids the calculation of whether an input code is a canonical code,the new algorithm reduces the computation.Secondly, to resolve the problem of PartGraphMining algorithm,a large graph databases mining algorithm , needs to repeat scan the database when handling large graph databases ,we present an improved algorithm IPGM. Through using CSGM algorithm to mine the frequent subgraph after being partitioned, meanwhile using hash table to store graph's hash address, it can avoid repeat scanning the databases and reduce the number of judging subgraph isomorphismFinally,the experiments verify the correctness, the efficiency and the ability of using the CSGM algorithm and IPGM algorithm to handle large graph databases .
Keywords/Search Tags:Frequent subgraph, large graph databases, ADI++ storage structure, Minimum DFS code, Repeat scan
PDF Full Text Request
Related items