Font Size: a A A

Research On Graph Frequent Substructure Mining Method

Posted on:2018-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:H X LiFull Text:PDF
GTID:2348330569486411Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology such as Internet and database technology,the network produced and accumulated a large number of semi-structured data,for example,Xml documents,Web pages,chemical compounds and biological molecules.These complex data often hide rich knowledge.Frequent pattern mining can find potential rules and patterns,and is widely used in practical applications.As an important data structure,graph can express rich semantics,which is suitable for semi-structured data modeling.Frequent pattern mining of semistructured data can be transformed into frequent substructure mining of graphs.The frequent substructure mining of graphs can be divided into two directions,frequent subtree mining and frequent subgraph mining.There are two kinds of algorithms for mining frequent subtrees,which are based on the idea of "generating candidate substructure and computating support degree" and "projection database".There are two major flaws of the algorithms based on the idea of "generating candidate substructure and computating support degree",it cannot generate all candidate subtrees and will output large scale frequent subtrees with redundant information.In this paper,we proposed GAST(Get All Subrees)and MCRP(Mining Coverage Pattern)algorithm for above two flaws.In order to avoid the omission of information,GAST algorithm uses the width and child number coding method to encode the tree,and then uses the edge expansion based on maximum prefix coding sequence to generate all candidate subtrees.In order to reduce the number of frequent subtrees and the redundancy of the information,MCRP algorithm,on the basis of GAST algorithm,only outputs the frequent subtrees satisfying the -coverage condition by mining the coverage patterns.For the frequent substructure mining of graphs,the current algorithm is mainly based on the idea of "generateing candidate substructure and computating support degree".As this algorithm need to judge subgraph isomorphism during calculating candidate subgraph supporting degree,so the computation complexity of this algorithm is higher.Aiming at the shortcomings of the traditional frequent subgraph mining algorithm,this paper proposed the MFSGBPC(Mining Frequent SubGraph Based on Projection Coding)algorithm based on the idea of "projection database" to mine the frequent subgraphs.The algorithm avoids judging subgraph isomorphism and improved the efficiency.The theoretical and experimental results show that the MCRP algorithm is more effective and can generate all candidate subtrees without redundancy compared with traditional methods.The MFSGBPC algorithm can avoid generating candidate subgraphs and calculating candidate subgraphs supporting degree so that the efficiency has improved.
Keywords/Search Tags:substructure of graph, frequent subtree, frequent subgraph, width and child number coding, maximum prefix coding sequence, coverage pattern, projection database
PDF Full Text Request
Related items