Font Size: a A A

A Research On A New Algorithm For Mining Frequent Subtrees

Posted on:2007-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:W J XieFull Text:PDF
GTID:2178360185980669Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the explosion of information, we are overwhelmed in the information garbage. How to discovery the useful knowledge from such an informational bin is the main purpose of DM in the past decades. The original data target of DM is the structured tables or transaction DBs. Up to date, the techniques in the field are developed to the real applications. However, they can't be applied to many other applications such as semi-structured and non-structured data as these techniques can't model the data fromats correctly. Graph structure can model the complicated relationships between entities and it also can be used in such domains. As far, graph based DM becomes a novel hot topic in the field of DM. Graph based DM has a variety of applications such as Web mining, spatial data mining, substructure mining in the bioinformatics, drug molecula design as well as it's functional prediction, etc. Trees are acyclic graphs, frequent subtree mining has the theoretical and practical significance.Our contribution in this paper includes: (1) based on the usual frequent subtree definition, we propose a novel definition of frequent subtree mining, (2) propose a tree isomorphism based method to facilitate the support and frequent counting of the candidates, (3) in order to speed-up the visit of the database, we propose a table representation of a forest, (4) propose a novel candidate generation method based on the data tree that can significantly reduce the invalid tree matchings during the tree counting, (5) propose algorithm FSubtreeM which employs the above techniques to mine frequent induced free subtrees from a forest consisting of free trees.Performance study indicates that FSubtreeM can effectively discovery the frequent induced free subtrees from the database carcinogen, it also can form some interesting association rules from the frequent subtrees which has some theoretical and practical significance.
Keywords/Search Tags:frequent subtree mining, isomorphism, coding, canonicalization, tree center
PDF Full Text Request
Related items