Font Size: a A A

Research On Support Counting And The Algorithm Improvement In Treeminer

Posted on:2013-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2248330371487114Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of information era, many research methods have been introduced to handle data explosion problem and that cause the rapid development of data mining disciplines. Data mining can be extended to various application fields such as computer networks, Web mining, bio-informatics, XML document mining and other fields. The complex structure of data in these areas almost can be represented by using tree or graph and the research of frequent subtree mining methods has become a hot spot to extract useful information on these subjects.In this paper, we do research in TreeMiner algorithm, which is one method of frequent subtree mining.,we improve the algorithm through support counting. TreeMiner algorithm use the method of joining scope list to calculate the degree of support, so the improvements specific to the research of joining scope list.The main work of this paper includes the following aspects:First, the paper presents the improved algorithm TreeMiner+based on suspending the process of scope list joining.After generating candidate trees, TreeMiner algorithm prunes unfrequent subtrees by joining scope-lists. The process will do the judgment of the frequent subtree after finishing all execution which uses scope-list joining, then it will prune based on the support calculating of subtrees. The judgement of support growth is the judgement about whether the subtree occurs in a tree, which is unrelated with the number of occurrences in the tree. In the improved algorithm TreeMiner+, we will pause the operation when it meets the first successful scope-list join and increase the support, and then operate the following scope-list join. Once we determine the frequency or non-frequency of the subtree, it is to be pruned or recorded as we wish. Also TreeMiner+uses a specific order of scope list for joining, which could be faster to determine the frequency of subtrees. The experimental results show that TreeMiner+can effectively improve the time and space performance.Secondly, the paper presents improved algorithm TreeMiner*. According to the needs of practical application, we take into account the number of occurrences of the subtree in a tree which should be greater than once in support calculation, the scope list joining needs to successfully several times to resulting in support for the degree of increase. Accordingly, the paper defines the concept of N-frequent subtree, then designed and implemented the TreeMiner for mining frequent trees which occurrent at least N times in a tree. The experimental results show that TreeMiner is feasible and effective.
Keywords/Search Tags:frequent subtree mining, support, equivalence class, scope-listjoining, order joining, TreeMiner~+, N-frequent subtree, TreeMiner~*
PDF Full Text Request
Related items