Font Size: a A A

Research On Frequent Subtree Mining And Pruning Strategies

Posted on:2011-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:D J DengFull Text:PDF
GTID:2178360305465519Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The tree mining task, which discovers frequent subtrees from a large tree database, is an important data mining problem. It has attracted considerable attention from database practitioners and researchers because of its broad applications in many areas such as analysis of XML database, molecular database, multi-table relational database and graph database etc.Most of the current algorithms for mining frequent subtree are based on Apriori property, which have primary two steps, candidate pattern generation and frequency counting. In each candidate generation step, algorithm iteratively generate all candidate (k+l)-subtrees from all frequent k-subtrees. The main bottleneck of these algorithms is that huge number of candidate subtrees could be generated and the cost of candidate generation and frequency counting is very expensive. In fact, a lot of candidate subtrees is infrequent or not exist in database.In this paper, based on the analysis of the existing pruning strategies used in frequent subtree mining algorithms, we explore a novel pruning strategy F2SC (Frequent 2-Subtree Checking). F2SC starts its work by keeping all the frequent 2-subtrees in a hash table, in candidate k-subtrees (k≥3) generation step, prune the infrequent candidate patterns by checking the hash table. F2SC can be used in all the frequent subtree mining algorithms based on Apriori property. For effectiveness testing reason, we optimize TreeMiner and present the improved algorithm TMP, which uses F2SC to prune invalid candidate subtree patterns. We also evaluate the effectiveness of the pruning strategy in comparison to the previous works. The experimental result shows that the TreeMiner get great benefits from the proposed pruning strategy, main changes are the following two aspects, decrease the generation of candidate patterns and reduce the total cost of frequency counting effectively.
Keywords/Search Tags:frequent subtree mining, frequent subtree patterns, candidate pattern, frequency counting, pruning strategy
PDF Full Text Request
Related items