Font Size: a A A

A DFS-Based Algorithm For Mining Frequent Connected Induced Subgraphs

Posted on:2010-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:W Y LiuFull Text:PDF
GTID:2178330332987439Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a general data structure, graphs have become increasingly important in modeling sophisticated structures and their interactions, with broad applications including chemical informatics, bioinformatics, computer vision, video indexing, text retrieval, and Web analysis. Mining frequent subgraph patterns for further characterization, discrimination, classification, and cluster analysis becomes an important task. With the increasing demand on the analysis of large amount of structured data, frequent subgraph mining has become an active and important theme in data mining.In this paper, some works are done in the frequent subgraph mining. Firstly, some classical graph-based algorithms for mining frequent subgraphs are systematically studied and comprehensively summarized. Secondly, this paper proposes a new algorithm named CISM (Connected Induce Subgraph Mining) to frequent connected induced subgraphs in a given graph dataset GD. It uses a vertex-based candidate generation method that increases the subgraph size by one vertex at a time, as well, uses the depth-first search (DFS) strategy. CISM initially finds all the frequent vertices, removes the infrequent ones and their adjacent edges from GD, which will reduce the scale of this problem. Then, it finds all the frequent edges, and sorts them in descending order by their frequency. Then, it starts the main computational loop. During each iteration, it starts with a frequent edge, generates all this edge's k-size candidates, and then counts the frequency for each of these candidates, and prunes subgraphs that do not satisfy the support constraint, obtains the frequent ones, and then finds the (k+1)-size frequent subgraphs by the k-size frequent ones, repeats until there are no more frequent ones. And then it removes this edge from GD, and goes to the next iteration. CISM is able to reduce the redundant subgraphs, improves its efficiency. Finally, we apply CISM to the PTE chemical compound dataset. Experiments show that CISM algorithm achieves good performance.
Keywords/Search Tags:Graph Mining, Frequent Induced Subgraph, Graph Isomorphism, Algorithm
PDF Full Text Request
Related items